跟踪R中特定事件的数据框

问题描述 投票:1回答:2

我有下表:

Name        Date       Score
John      11-01-02      40
John      11-01-03      47
John      11-01-04      41
John      11-01-05      35
John      11-01-06      52
John      11-01-07      47
John      11-01-08      45
John      11-01-09      43
John      11-01-10      40
Adam      11-01-02      41
Adam      11-01-03      41
Adam      11-01-04      49
Adam      11-01-05      40
Adam      11-01-06      40

我只是想跟踪以下事件:对于每个学生,记录学生的时间和次数1)分数增加5或更多,然后分数减少5或更多或2)得分减少5或更多,然后分数增加5或更多。

我做了下表来帮助完成上述任务:每个学生的分数之间的差异表。

Name        Date      Difference
John      11-01-03       7
John      11-01-04      -6
John      11-01-05      -6
John      11-01-06      17
John      11-01-07      -5
John      11-01-08      -2
John      11-01-09      -2
John      11-01-10      -3
Adam      11-01-04       8
Adam      11-01-05      -9
Adam      11-01-06       0

例如,在11-01-03,John的得分从11-01-02的40上升到47,所以有47-40 = 7的差异。

我想将下表作为输出:

一个跟踪名称,事件日期的人

Name        Dates for Events
John            11-01-03      
John            11-01-05
John            11-01-06
Adam            11-01-04

在11-01-03,John经历了7分的变化,接着是-6,所以John经历了我所描述的事件。其他日期也包括在内。

在R中有一种简单的方法吗?任何帮助将不胜感激。

r dataframe datatable
2个回答
1
投票

使用dplyr的一个选项可能是:

data %>% group_by(Name) %>%
  mutate(diff = lead(Score) - Score,
         score_increase_5 = ifelse(diff >= 5, TRUE, FALSE),
         score_decrease_5 = ifelse(diff <= -5, TRUE, FALSE)) %>%
  filter(!is.na(diff)) %>%
  mutate(event = ((score_decrease_5 & lag(score_increase_5)) |
  (score_increase_5 & lag(score_decrease_5)))) %>%
  filter(event) %>%
  select(Name, Date)

1
投票

我们的想法是创建两列与前一行的差异,以及与以下行的差异。然后,您可以选择带有条件的子data.frame。

这是data.table的解决方案

library(data.table)
plouf <- read.table(text = "
Name        Date       Score
John      11-01-02      40
John      11-01-03      47
John      11-01-04      41
John      11-01-05      35
John      11-01-06      52
John      11-01-07      47
John      11-01-08      45
John      11-01-09      43
John      11-01-10      40
Adam      11-01-02      41
Adam      11-01-03      41
Adam      11-01-04      49
Adam      11-01-05      40
Adam      11-01-06      40",header = T)
plouf <- setDT(plouf)
plouf[,Score:= as.numeric(Score)]
plouf[,diffprev := c(NA,diff(Score)), by = Name]
plouf[,difffol :=c(Score[2:.N]-Score[1:(.N-1)],NA),by = Name]

然后你做选择

plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5),.(Name,Date)]

> plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5)]
   Name     Date Score diffprev difffol
1: John 11-01-03    47        7      -6
2: John 11-01-05    35       -6      17
3: John 11-01-06    52       17      -5
4: Adam 11-01-04    49        8      -9
© www.soinside.com 2019 - 2024. All rights reserved.