python - Pandas: Select balanced sample -


i have data frame 3000 companies covering 5 years.

id     company          year       value 0      1111111          2016         nan 1      1111111          2015      3871.0 2      3333333          2016      3989.0 3      3333333          2015      3648.0 4      4444444          2016      5456.0 5      4444444          2015         nan 6      2222222          2016         nan 7      2222222          2015        10.0 8      5555555          2016      1515.0 9      5555555          2015      2654.0 

i make selection, makes sure companies not have nan value. there data periods in selection, , equal number of companies per period.

what easiest way doing this?

result should be:

id     company          year       value 2      3333333          2016      3989.0 3      3333333          2015      3648.0 7      5555555          2016      1515.0 8      5555555          2015      2654.0 

thanks

groupby.count() returns number of non-null values if groupby companies, count should equal number of years. assuming no duplicates, can this:

df.ix[df.groupby('company')['value'].transform('count') > 1, :] out[259]:     id  company  year   value 2   2  3333333  2016  3989.0 3   3  3333333  2015  3648.0 8   8  5555555  2016  1515.0 9   9  5555555  2015  2654.0 

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -