python - parsing CSV to pandas dataframes (one-to-many unmunge) -


i have csv file imported pandas dataframe. came database export combined one-to-many parent , detail table. format of csv file follows:

header1, header2, header3, header4, header5, header6  sample1, property1,,,average1,average2 ,,detail1,detail2,, ,,detail1,detail2,, ,,detail1,detail2,,  sample2, ... ,,detail1,detail2,, ,,detail1,detail2,, ... 

(i.e. line 0 header, line 1 record 1, lines 2 through n details, line n+1 record 2 , on...)

what best way extricate (renormalize?) details separate dataframes can referenced using values in sample# records? number of each subset of details different each sample.

i can use:

samplelist = df.header2[pd.notnull(df.header2)] 

to starting index of each sample can grab samplelist.index[0] samplelist.index[1] , put in smaller dataframe. detail records have no reference sample came from, has inferred order of csv file (notice there no intersection of filled/empty fields in example).

should make list of dataframes, dict of dataframes, or panel of dataframes?

can somehow create variables sample1 record fields , somehow attach them each dataframe has detail records (like collection of objects have several scalar members , 1 dataframe each)?

eventually create statistics on data each detail record grouping , plot them against values in sample records (e.g. sampletype, day or date, etc. vs. mystatistic). create intermediate series attached sample grouping kernel density estimation pdf or histogram.

thanks.

you can use fact first column seems empty unless it's new sample record .fillna(method='ffill') , .groupby('header1') separate groups. on these, can calculate statistics right away or store separate dataframe. high level sketch follows:

df.header1 = df.header1.fillna(method='ffill') sample, data in df.groupby('header1'):      print(sample) # access sample name      data = ... # process sample records 

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -