python - Filling NAN data with mode() doesn't work -Pandas -
i have data set in there series known outlet_size
contain either of {'medium', nan, 'high', 'small'}
around 2566 records missing thought fill mode() value wrote :
train['outlet_size']=train['outlet_size'].fillna(train['outlet_size'].dropna().mode()]
but when tried find number of missing nan record command
sum(train['outlet_size'].isnull())
it still showing 2566 nan records.why ?
thank answers
the problem here mode
returns series , causing fillna
fail, if @ simple example:
in [194]: df = pd.dataframe({'a':['low','low',np.nan,'medium','medium','medium','medium']}) df out[194]: 0 low 1 low 2 nan 3 medium 4 medium 5 medium 6 medium in [195]: df['a'].fillna(df['a'].mode()) out[195]: 0 low 1 low 2 nan 3 medium 4 medium 5 medium 6 medium name: a, dtype: object
so can see fails above, if @ mode
returns:
in [196]: df['a'].mode() out[196]: 0 medium dtype: object
it's series albeit single row, when pass fillna
fills first row, want scalar value indexing series
:
in [197]: df['a'].fillna(df['a'].mode()[0]) out[197]: 0 low 1 low 2 medium 3 medium 4 medium 5 medium 6 medium name: a, dtype: object
edit
regarding whether dropna
required, no isn't:
in [204]: df = pd.dataframe({'a':['low','low',np.nan,'medium','medium','medium','medium',np.nan,np.nan,np.nan,np.nan]}) df['a'].mode() out[204]: 0 medium dtype: object
you can see nan
ignored
Comments
Post a Comment