r - Efficient assignment of a function with multiple outputs in dplyr mutate or summarise -
i've noticed lot of examples here uses dplyr::mutate
in combination function returning multiple outputs create multiple columns. example:
tmp <- mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6])
such syntax means summary
function called 4 times, in example, not seem particularly efficient. ways there efficiently assign list output list of column names in summarise
or mutate
?
for example, previous question: split data frame column containing list multiple columns using dplyr (or otherwise), know can assign output of summary
list , split using do(data.frame(...))
, means have add column names later , syntax not pretty.
this addresses example, perhaps not principal question. in case showed, rewrite as:
tmp <- mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median, mean, max), mpg)
this more efficient, taking 40% time run:
microbenchmark(mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median, mean, max), mpg), times = 1000l) mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median,mean, max), mpg) min lq mean median uq max neval 2.002762 2.159464 2.330703 2.216719 2.271264 7.771477 1000 microbenchmark(mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6]), times = 1000l) mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6]) min lq mean median uq max neval 4.967731 5.21122 5.571605 5.360689 5.530197 13.26596 1000
however, there other cases whether not address problem.
edit:
the do()
function can solve this. e.g.
by_cyl <- group_by(mtcars, cyl) %>% do(mod = summary(.)[c(1,4,6),])
Comments
Post a Comment