r - Apply a specific function to a certain subset of a dataframe based on time frequency -
i have problem in figure out how possible apply example mean function subset of dataframe based on time frequency.
i explain specific situation: have dataframe reporting data fuel consumption of trucks (having specific plate number) measured @ specific day/time. i'd calculate mean of fuel consumption time series maximum time frequency of 5 minutes (if consecutive events happen 5 minutes each other calculate mean).
here example of initial dataframe , subsets of data want obtain:
data.frame:
columns names respectively plate.number, date.time , fuel.consumption
ab 2016-07-03 09:21:10 23.45 ab 2016-07-03 09:22:33 33.65 bc 2016-07-03 09:23:28 56.22 ab 2016-07-03 09:24:13 21.33 bc 2016-07-03 10:32:45 33.42 zf 2016-07-03 10:32:45 28.45 zf 2016-07-03 10:34:12 29.55 ab 2016-07-03 11:26:54 28.73 ab 2016-07-03 11:27:33 27.98 bc 2016-07-03 11:28:45 42.45 ab 2016-07-04 10:32:45 34.72 ab 2016-07-04 10:33:33 30.51 ab 2016-07-04 14:54:28 28.66
a time series in case:
ab 2016-07-03 09:21:10 23.45 ab 2016-07-03 09:22:33 33.65 ab 2016-07-03 09:24:13 21.33
or:
ab 2016-07-03 11:26:54 28.73 ab 2016-07-03 11:27:33 27.98
as can see time between 1 event , following 1 less 5 minutes. once have these groups quite easy calculate mean of fuel consumption per each group.
ah, might helpful know "date.time" format posixct proper date/time.
any idea function should use? thought maybe possible using function aggregate? how specify time frequency?
thank time , help.
first define function calculates number of seconds since first observation. if exceeds 300, start new group , reset start time. function assumes observations ordered in time.
group_on_seconds <- function(df_part, nr_of_secs = 300) { group_start <- df_part$date.time[1] group_ind <- df_part$group <- 1 for(i in 2:nrow(df_part)) { if( (as.numeric(df_part$date.time[i]) - as.numeric(group_start)) > nr_of_secs) { group_start <- df_part$date.time[i] group_ind <- group_ind + 1 } df_part$group[i] <- group_ind } df_part }
order df
on time, split on plate number , apply function. bind results together.
library(dplyr) df_group <- df[order(df$date.time), ] %>% split(df$plate.number) %>% lapply(group_on_seconds) %>% do.call('rbind', .)
calculate mean on combination of plate.number
, group
.
df_group %>% group_by(plate.number, group) %>% summarise(mn = mean(fuel.consumption))
Comments
Post a Comment