How to remove duplicate columns (content) in data.table R? -

- May 15, 2014

how remove duplicate columns data.table? (keeping 1 of them)

i know there other questions duplicate columns check duplicate column names not content,

what want columns different names same content.

regards

this common task in feature engineering. following code chunk developed myself , community on kaggle purpose:

##### removing identical features features_pair <- combn(names(train), 2, simplify = f) # list column pairs toremove <- c() # init vector store duplicates for(pair in features_pair) { # put pairs testing temp objects   f1 <- pair[1]   f2 <- pair[2]    if (!(f1 %in% toremove) & !(f2 %in% toremove)) {     if (all(train[[f1]] == train[[f2]])) { # test duplicates       cat(f1, "and", f2, "are equals.\n")       toremove <- c(toremove, f2) # build list of duplicates     }   } }

then can drop whichever copy of duplicates want. default use version stored in temporary object f2 , remove them this:

train <- train[,!toremove]

Search This Blog

To form

How to remove duplicate columns (content) in data.table R? -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

extjs - Set tooltip on click event on the grid cell -