r - manipulating two data frames based on string with different lengths -

- January 15, 2010

i have asked question here finding index based on 2 data frames of strings , got perfect answer. have been facing problem not solve it. if second data more 1 column can solve based on

setdt(strs)[, c('colids1','colids2') := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))), = 1:nrow(strs)][]

this ok long second data (strs) has same length in columns if vary (not in same length) not work , give me error.

so let first data

lut <- structure(list(v1 = c("o75663", "o95400", "o95433", na, na),      v2 = c("o95456", "o95670", na, na, na), v3 = c("o75663",      "o95400", "o95433", "o95456", "o95670"), v4 = c("o95456",      "o95670", "o95801", "p00352", na), v1 = c("o75663", "o95400",      "o95433", na, na), v2 = c("o95456", "o95670", na, na, na),      v3 = c("o75663", "o95400", "o95433", "o95456", "o95670"),      v4 = c("o95456", "o95670", "o95801", "p00352", na)), .names = c("v1",  "v2", "v3", "v4", "v1", "v2", "v3", "v4"), row.names = c(na,  -5l), class = "data.frame")

and second data

strs <- structure(list(strings = structure(c(2l, 3l, 4l, 5l, 6l, 7l,  1l, 1l), .label = c("", "o75663", "o95400", "o95433", "o95456",  "o95670", "o95801"), class = "factor"), strings2 = structure(c(4l,  2l, 6l, 5l, 3l, 1l, 1l, 1l), .label = c("", "o75663", "o95433",  "o95456", "p00352", "p00492"), class = "factor"), strings3 = structure(c(4l,  6l, 7l, 8l, 2l, 3l, 5l, 1l), .label = c("", "o75663", "o95400",  "o95456", "o95670", "o95801", "p00352", "p00492"), class = "factor"),      strings4 = structure(c(2l, 5l, 3l, 4l, 1l, 1l, 1l, 1l), .label = c("",      "o95400", "o95456", "o95801", "p00492"), class = "factor"),      strings5 = structure(c(8l, 2l, 7l, 1l, 3l, 6l, 5l, 4l), .label = c("o75663",      "o95400", "o95433", "o95456", "o95670", "o95801", "p00352",      "p00492"), class = "factor")), .names = c("strings", "strings2",  "strings3", "strings4", "strings5"), class = "data.frame", row.names = c(na,  -8l))

this tried

df<- setdt(strs)[, paste0('colids_',seq_along(strs)) := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))), = 1:nrow(strs)][]

it works if length of strs same not work when length varies example gave here

converting factor variables in strs character variables, can done data.table. supposing strs dataset data.table, should do:

strs[, names(strs) := lapply(.sd, as.character)]

if strs not data.table, should use:

setdt(strs)[, names(strs) := lapply(.sd, as.character)]

after can perform operation wanted. chained together, looks like:

setdt(strs)[, lapply(.sd, as.character)             ][, paste0('colids_',seq_along(strs)) := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))),                = 1:nrow(strs)][]

Search This Blog

To form

r - manipulating two data frames based on string with different lengths -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -