python - Pandas: Select balanced sample -
i have data frame 3000 companies covering 5 years.
id company year value 0 1111111 2016 nan 1 1111111 2015 3871.0 2 3333333 2016 3989.0 3 3333333 2015 3648.0 4 4444444 2016 5456.0 5 4444444 2015 nan 6 2222222 2016 nan 7 2222222 2015 10.0 8 5555555 2016 1515.0 9 5555555 2015 2654.0
i make selection, makes sure companies not have nan value. there data periods in selection, , equal number of companies per period.
what easiest way doing this?
result should be:
id company year value 2 3333333 2016 3989.0 3 3333333 2015 3648.0 7 5555555 2016 1515.0 8 5555555 2015 2654.0
thanks
groupby.count() returns number of non-null values if groupby companies, count should equal number of years. assuming no duplicates, can this:
df.ix[df.groupby('company')['value'].transform('count') > 1, :] out[259]: id company year value 2 2 3333333 2016 3989.0 3 3 3333333 2015 3648.0 8 8 5555555 2016 1515.0 9 9 5555555 2015 2654.0
Comments
Post a Comment