r - Efficient assignment of a function with multiple outputs in dplyr mutate or summarise -


i've noticed lot of examples here uses dplyr::mutate in combination function returning multiple outputs create multiple columns. example:

tmp <- mtcars %>%     group_by(cyl) %>%     summarise(min = summary(mpg)[1],               median = summary(mpg)[3],               mean = summary(mpg)[4],               max = summary(mpg)[6]) 

such syntax means summary function called 4 times, in example, not seem particularly efficient. ways there efficiently assign list output list of column names in summarise or mutate?

for example, previous question: split data frame column containing list multiple columns using dplyr (or otherwise), know can assign output of summary list , split using do(data.frame(...)), means have add column names later , syntax not pretty.

this addresses example, perhaps not principal question. in case showed, rewrite as:

tmp <- mtcars %>%     group_by(cyl) %>%     summarise_each(funs(min, median, mean, max), mpg) 

this more efficient, taking 40% time run:

microbenchmark(mtcars %>%                  group_by(cyl) %>%                  summarise_each(funs(min, median, mean, max), mpg),                                  times = 1000l)    mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median,mean, max), mpg)       min       lq     mean   median       uq      max neval  2.002762 2.159464 2.330703 2.216719 2.271264 7.771477  1000   microbenchmark(mtcars %>%     group_by(cyl) %>%     summarise(min = summary(mpg)[1],               median = summary(mpg)[3],               mean = summary(mpg)[4],               max = summary(mpg)[6]), times = 1000l)   mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6])       min      lq     mean   median       uq      max neval  4.967731 5.21122 5.571605 5.360689 5.530197 13.26596  1000 

however, there other cases whether not address problem.

edit:

the do() function can solve this. e.g.

by_cyl <- group_by(mtcars, cyl) %>%         do(mod = summary(.)[c(1,4,6),]) 

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -

c++ - Migration from QScriptEngine to QJSEngine -