How to remove duplicate columns (content) in data.table R? -


how remove duplicate columns data.table? (keeping 1 of them)

i know there other questions duplicate columns check duplicate column names not content,

what want columns different names same content.

regards

this common task in feature engineering. following code chunk developed myself , community on kaggle purpose:

##### removing identical features features_pair <- combn(names(train), 2, simplify = f) # list column pairs toremove <- c() # init vector store duplicates for(pair in features_pair) { # put pairs testing temp objects   f1 <- pair[1]   f2 <- pair[2]    if (!(f1 %in% toremove) & !(f2 %in% toremove)) {     if (all(train[[f1]] == train[[f2]])) { # test duplicates       cat(f1, "and", f2, "are equals.\n")       toremove <- c(toremove, f2) # build list of duplicates     }   } } 

then can drop whichever copy of duplicates want. default use version stored in temporary object f2 , remove them this:

train <- train[,!toremove] 

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -