r - manipulating two data frames based on string with different lengths -
i have asked question here finding index based on 2 data frames of strings , got perfect answer. have been facing problem not solve it. if second data more 1 column can solve based on
setdt(strs)[, c('colids1','colids2') := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))), = 1:nrow(strs)][]
this ok long second data (strs) has same length in columns if vary (not in same length) not work , give me error.
so let first data
lut <- structure(list(v1 = c("o75663", "o95400", "o95433", na, na), v2 = c("o95456", "o95670", na, na, na), v3 = c("o75663", "o95400", "o95433", "o95456", "o95670"), v4 = c("o95456", "o95670", "o95801", "p00352", na), v1 = c("o75663", "o95400", "o95433", na, na), v2 = c("o95456", "o95670", na, na, na), v3 = c("o75663", "o95400", "o95433", "o95456", "o95670"), v4 = c("o95456", "o95670", "o95801", "p00352", na)), .names = c("v1", "v2", "v3", "v4", "v1", "v2", "v3", "v4"), row.names = c(na, -5l), class = "data.frame")
and second data
strs <- structure(list(strings = structure(c(2l, 3l, 4l, 5l, 6l, 7l, 1l, 1l), .label = c("", "o75663", "o95400", "o95433", "o95456", "o95670", "o95801"), class = "factor"), strings2 = structure(c(4l, 2l, 6l, 5l, 3l, 1l, 1l, 1l), .label = c("", "o75663", "o95433", "o95456", "p00352", "p00492"), class = "factor"), strings3 = structure(c(4l, 6l, 7l, 8l, 2l, 3l, 5l, 1l), .label = c("", "o75663", "o95400", "o95456", "o95670", "o95801", "p00352", "p00492"), class = "factor"), strings4 = structure(c(2l, 5l, 3l, 4l, 1l, 1l, 1l, 1l), .label = c("", "o95400", "o95456", "o95801", "p00492"), class = "factor"), strings5 = structure(c(8l, 2l, 7l, 1l, 3l, 6l, 5l, 4l), .label = c("o75663", "o95400", "o95433", "o95456", "o95670", "o95801", "p00352", "p00492"), class = "factor")), .names = c("strings", "strings2", "strings3", "strings4", "strings5"), class = "data.frame", row.names = c(na, -8l))
this tried
df<- setdt(strs)[, paste0('colids_',seq_along(strs)) := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))), = 1:nrow(strs)][]
it works if length of strs same not work when length varies example gave here
converting factor variables in strs
character variables, can done data.table
. supposing strs
dataset data.table
, should do:
strs[, names(strs) := lapply(.sd, as.character)]
if strs
not data.table, should use:
setdt(strs)[, names(strs) := lapply(.sd, as.character)]
after can perform operation wanted. chained together, looks like:
setdt(strs)[, lapply(.sd, as.character) ][, paste0('colids_',seq_along(strs)) := lapply(.sd, function(x) tostring(which(colsums(lut == x, na.rm=true) > 0))), = 1:nrow(strs)][]
Comments
Post a Comment