计算满足 R 中条件的连续列

Question

我正在 RStudio 中处理我的数据集，其中包含不同地点饲养的奶牛的每日产奶量。另外，我有 5 列，其中包含挤奶控制日之前 5 天的温度。

我想计算温度超过特定阈值（例如 30°C）的连续天数，与这些天的位置无关（连续 3 天可能发生在挤奶控制之前的 3、4、5 天内）例子）。另外，如果在这 5 天内发生一个事件（例如 1 天）和另一个连续 3 天的事件，我需要考虑较高的数字。

这是一个玩具数据集，示意性地反映了我的数据集。如何在R中计算满足我的条件的连续天数？

data <- data.frame(cow=1:5, milk=c(35,36,36,35,34), 
       day1ago=c(27,28,20,24,33), 
       day2ago=c(25,25,32,31,28),
       day3ago=c(22,31,25,31,29),
       day4ago=c(28,33,32,33,28),
       day5ago=c(29,28,33,34,31))

对于这些玩具数据集，我希望获得这样的向量：

data$consecutive_days = c(0,2,2,4,1)

Answer 1

一种可能的方法：

library(tidyverse)
library(runner)

data <- data.frame(
  cow = 1:5,
  milk = c(35, 36, 36, 35, 34),
  day1ago = c(27, 28, 20, 24, 33),
  day2ago = c(25, 25, 32, 31, 28),
  day3ago = c(22, 31, 25, 31, 29),
  day4ago = c(28, 33, 32, 33, 28),
  day5ago = c(29, 28, 33, 34, 31)
)

data |>
  pivot_longer(starts_with("day")) |>
  mutate(
    above_thresh = if_else(value >= 30, 1, 0),
    consecutive_days = streak_run(above_thresh),
    consecutive_days = if_else(above_thresh == 1, consecutive_days, 0),
    .by = cow
  ) |>
  arrange(cow, above_thresh, consecutive_days) |>
  slice_tail(n = 1, by = cow) |>
  select(cow, consecutive_days)
#> # A tibble: 5 × 2
#>     cow consecutive_days
#>   <int>            <dbl>
#> 1     1                0
#> 2     2                2
#> 3     3                2
#> 4     4                4
#> 5     5                1

^{创建于 2024-03-14，使用 reprex v2.1.0}

Answer 2

您可以在逐行数据框上使用

rle

：

library(dplyr)

data |>
  rowwise() |>
  mutate(consecutive_days = with(rle(c_across(starts_with("day")) > 30), max(lengths[values])),
         consecutive_days = ifelse(consecutive_days < 0, 0, consecutive_days)) |>
  ungroup()

注意：

max

功能将产生以下警告：

警告消息：
mutate()
中有 1 条警告。

ℹ 在论证中：
consecutive_days = with(rle(c_across(starts_with("day")) > 30), max(lengths[values]))
。

ℹ 在第 1 行。由
max()
中的警告引起：！不 max 的非缺失参数；返回-Inf

这是因为第一行中没有值高于阈值，但这里可以忽略。在本例中，

max

返回

-Inf

，在

ifelse

语句中将其替换为 0。

输出

    cow  milk day1ago day2ago day3ago day4ago day5ago consecutive_days
  <int> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>            <dbl>
1     1    35      27      25      22      28      29                0
2     2    36      28      25      31      33      28                2
3     3    36      20      32      25      32      33                2
4     4    35      24      31      31      33      34                4
5     5    34      33      28      29      28      31                1

计算满足 R 中条件的连续列

问题描述投票：0回答：2

2个回答

最新问题

计算满足 R 中条件的连续列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2