我正在使用 R 并且我有如下数据:
data<-data.frame(time=c(20230301000000,20230301000010, 20230301000020, 20230301000030,20230301000040,
20230301000050, 20230301000100 , 20230301000110 , 20230301000120, 20230301000130,
20230301000140, 20230301000150, 20230301000200, 20230301000210, 20230301000220,
20230301000230, 20230301000240, 20230301000250,20230301000300),
switch=c(40,41,42,43,0,0,0,51,52,53,0,0,0,55,56,57,52,0,0),
NH4=c(2,2,3,3,3,5,4,9,9,9,10,11,12,4,4,5,5,7,8))
这里,如果
switch
>0 ,则表示开关打开。否则,如果switch
=0,则表示开关关闭。 switch
始终具有非负值。我想以秒为单位计算 switch
打开和关闭的持续时间。 time
是按年、月、日、时、分、秒顺序写的。在每个开/关持续时间内,我想计算 NH4
的最小值、平均值和最大值。对于打开的情况,我将其表示为on_NH4_min
、on_NH4_avg
和on_NH4_max
。对于关闭情况,我将其表示为off_NH4_min
、off_NH4_avg
、off_NH4_max
。所以我想要的输出应该是这样的:
summary_data <-data.frame(on_NH4_min=c(2,9,4),
on_NH4_avg=c(2.5,9,4,5),
on_NH4_max=c(3,9,5),
off_NH4_min=c(3,10,7),
off_NH4_avg=c(4,11,7,5),
off_NH4_max=c(5,12,8),
on_time=c(30,20,30),
off_time=c(20,20,10))
Chat GPT 给了我答案,但对我不起作用。 这是来自 Chat GPT 的代码。
library(dplyr)
library(tidyr)
# Create a variable that indicates whether the switch is on or off
data <- data %>%
mutate(switch_on = ifelse(switch > 0, 1, 0))
# Calculate the time duration of each on/off interval
data <- data %>%
mutate(interval = cumsum(switch_on != lag(switch_on, default = 0))) %>%
group_by(interval, switch_on) %>%
mutate(start_time = first(time),
end_time = last(time),
duration = difftime(end_time, start_time, units = "secs")) %>%
ungroup() %>%
select(-interval)
# Calculate the NH4 statistics for each on/off interval
summary_data <- data %>%
group_by(switch_on) %>%
summarise(NH4_min = min(NH4),
NH4_avg = mean(NH4),
NH4_max = max(NH4),
time = sum(duration)) %>%
pivot_wider(names_from = switch_on,
values_from = c(NH4_min, NH4_avg, NH4_max, time),
names_prefix = c("on_", "off_"))
我想得到
summary_data
.
使用 dplyr 和 tidyr:
library(dplyr)
library(tidyr)
data %>%
group_by(r = consecutive_id(switch > 0)) %>%
summarize(
switch = if_else(switch[1] > 0, "on", "off"),
time = diff(range(time)),
across(NH4, list(min = ~ min(.), avg = ~ max(.), max = ~ max(.)))
) %>%
mutate(grp = cumsum(switch == "on")) %>%
select(-r) %>%
pivot_wider(
id_cols = grp, names_from = switch,
values_from = c(time, NH4_min, NH4_avg, NH4_max),
names_glue = "{switch}_{.value}") %>%
select(-grp)
# # A tibble: 3 × 8
# on_time off_time on_NH4_min off_NH4_min on_NH4_avg off_NH4_avg on_NH4_max off_NH4_max
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 30 60 2 3 3 5 3 5
# 2 20 60 9 10 9 12 9 12
# 3 30 50 4 7 5 8 5 8
在使用
across(NH4, ..)
时,如果你的真实数据不仅仅是铵盐(我在推断)并且你也想总结它们,你可以直接添加它们,如across(c(NH4, CO2), ..)
.