使用 r

问题描述 投票:0回答:1

我正在使用 R 并且我有如下数据:

 data<-data.frame(time=c(20230301000000,20230301000010, 20230301000020, 20230301000030,20230301000040,
                        20230301000050, 20230301000100 , 20230301000110 , 20230301000120, 20230301000130,
                        20230301000140, 20230301000150, 20230301000200, 20230301000210, 20230301000220,
                        20230301000230, 20230301000240, 20230301000250,20230301000300),
                 switch=c(40,41,42,43,0,0,0,51,52,53,0,0,0,55,56,57,52,0,0),
                 NH4=c(2,2,3,3,3,5,4,9,9,9,10,11,12,4,4,5,5,7,8))  

这里,如果

switch
>0 ,则表示开关打开。否则,如果
switch
=0,则表示开关关闭。
switch
始终具有非负值。我想以秒为单位计算
switch
打开和关闭的持续时间。
time
是按年、月、日、时、分、秒顺序写的。在每个开/关持续时间内,我想计算
NH4
的最小值、平均值和最大值。对于打开的情况,我将其表示为
on_NH4_min
on_NH4_avg
on_NH4_max
。对于关闭情况,我将其表示为
off_NH4_min
off_NH4_avg
off_NH4_max
。所以我想要的输出应该是这样的:

summary_data <-data.frame(on_NH4_min=c(2,9,4),
                          on_NH4_avg=c(2.5,9,4,5),
                          on_NH4_max=c(3,9,5),
                          off_NH4_min=c(3,10,7),
                          off_NH4_avg=c(4,11,7,5),
                          off_NH4_max=c(5,12,8),
                          on_time=c(30,20,30),
                          off_time=c(20,20,10))  

Chat GPT 给了我答案,但对我不起作用。 这是来自 Chat GPT 的代码。

library(dplyr)
library(tidyr)

# Create a variable that indicates whether the switch is on or off
data <- data %>% 
  mutate(switch_on = ifelse(switch > 0, 1, 0))

# Calculate the time duration of each on/off interval
data <- data %>% 
  mutate(interval = cumsum(switch_on != lag(switch_on, default = 0))) %>% 
  group_by(interval, switch_on) %>% 
  mutate(start_time = first(time),
         end_time = last(time),
         duration = difftime(end_time, start_time, units = "secs")) %>% 
  ungroup() %>% 
  select(-interval)

# Calculate the NH4 statistics for each on/off interval
summary_data <- data %>% 
  group_by(switch_on) %>% 
  summarise(NH4_min = min(NH4),
            NH4_avg = mean(NH4),
            NH4_max = max(NH4),
            time = sum(duration)) %>% 
  pivot_wider(names_from = switch_on,
              values_from = c(NH4_min, NH4_avg, NH4_max, time),
              names_prefix = c("on_", "off_"))

我想得到

summary_data
.

r dataframe datetime group-by mutate
1个回答
1
投票

使用 dplyr 和 tidyr:

library(dplyr)
library(tidyr)
data %>%
  group_by(r = consecutive_id(switch > 0)) %>%
  summarize(
    switch = if_else(switch[1] > 0, "on", "off"), 
    time = diff(range(time)), 
    across(NH4, list(min = ~ min(.), avg = ~ max(.), max = ~ max(.)))
  ) %>%
  mutate(grp = cumsum(switch == "on")) %>%
  select(-r) %>%
  pivot_wider(
    id_cols = grp, names_from = switch, 
    values_from = c(time, NH4_min, NH4_avg, NH4_max), 
    names_glue = "{switch}_{.value}") %>%
  select(-grp)
# # A tibble: 3 × 8
#   on_time off_time on_NH4_min off_NH4_min on_NH4_avg off_NH4_avg on_NH4_max off_NH4_max
#     <dbl>    <dbl>      <dbl>       <dbl>      <dbl>       <dbl>      <dbl>       <dbl>
# 1      30       60          2           3          3           5          3           5
# 2      20       60          9          10          9          12          9          12
# 3      30       50          4           7          5           8          5           8

在使用

across(NH4, ..)
时,如果你的真实数据不仅仅是铵盐(我在推断)并且你也想总结它们,你可以直接添加它们,如
across(c(NH4, CO2), ..)
.

© www.soinside.com 2019 - 2024. All rights reserved.