使用应用函数按组将列互换到数据框中。

Question

我试图对分组数据框中的行使用应用函数，以检查该组中是否存在与每条行的某些条件相匹配的其他行。我可以让这个功能在一个组中工作，但不能在所有组中工作。

例如，在没有分组的情况下。

library(dplyr)

id <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
station <- c(1, 2, 3, 3, 2, 2, 1, 1, 3, 2, 2)
timeslot <- c(13, 14, 20, 21, 24, 23, 8, 9, 10, 15, 16)

df <- data.frame(id, station, timeslot)

s <- 2

df <- 
  df %>% 
  filter(id == 1) %>% 
  arrange(id, timeslot) %>% 
  mutate(match = ifelse(station == s, apply(., 1, function(x) (any(as.numeric(x[3] + 1) == .$timeslot))), FALSE))

  id station timeslot match
1  1       1       13 FALSE
2  1       2       14 FALSE
3  1       3       20 FALSE
4  1       3       21 FALSE
5  1       2       23  TRUE
6  1       2       24 FALSE

在上面的代码中，对于每一个2号站的行，我试图检查所有其他的行，看看是否存在一个值大于1的时间段（对于任何站）。这和预期的一样。

然后，我继续将其应用于分组数据框。

df <- 
  df %>% 
  group_by(id) %>% 
  arrange(id, timeslot) %>% 
  mutate(match = ifelse(station == s, apply(., 1, function(x) (any(as.numeric(x[3] + 1) == .$timeslot))), FALSE))


      id station timeslot match
   <int>   <int>    <int> <lgl>
 1     1       1       13 FALSE
 2     1       2       14 TRUE 
 3     1       3       20 FALSE
 4     1       3       21 FALSE
 5     1       2       23 TRUE 
 6     1       2       24 FALSE
 7     2       1        8 FALSE
 8     2       1        9 FALSE
 9     2       3       10 FALSE
10     2       2       15 FALSE
11     2       2       16 TRUE

得到了一些不想要的结果。它似乎不是按组应用的，我不知道如何解决这个问题。我怎样才能应用这个函数，使其只检查组内的其他行？实际上，我的数据集要大得多，条件也比较复杂，所以运行速度也不快。

先谢谢你了

编辑：我应该补充的是，我也尝试过使用arrang()和lead()函数的解决方案，但由于在我的大数据集中，一些时间段的值是由许多站点共享的，我不能让这个工作

Answer 1

这似乎是可行的。

df %>% 
  group_by(id) %>% 
  arrange(id, timeslot) %>% 
  mutate(match = station == s & ((timeslot + 1) %in% timeslot))
# # A tibble: 11 x 4
# # Groups:   id [2]
#       id station timeslot match
#    <dbl>   <dbl>    <dbl> <lgl>
#  1     1       1       13 FALSE
#  2     1       2       14 FALSE
#  3     1       3       20 FALSE
#  4     1       3       21 FALSE
#  5     1       2       23 TRUE 
#  6     1       2       24 FALSE
#  7     2       1        8 FALSE
#  8     2       1        9 FALSE
#  9     2       3       10 FALSE
# 10     2       2       15 TRUE 
# 11     2       2       16 FALSE

Answer 2

如果我理解错了问题，我真诚地道歉这是我从问题中理解到的。

 df$match = apply(df, 1, function(line) any(df$id == line[1] & 
                                            df$station == line[2] &
                                            df$timeslot == line[3] + 1))

结果是

   id station timeslot match
1   1       1       13 FALSE
2   1       2       14 FALSE
3   1       3       20  TRUE
4   1       3       21 FALSE
5   1       2       24 FALSE
6   1       2       23  TRUE
7   2       1        8  TRUE
8   2       1        9 FALSE
9   2       3       10 FALSE
10  2       2       15  TRUE
11  2       2       16 FALSE

使用应用函数按组将列互换到数据框中。

问题描述投票：1回答：2

2个回答

最新问题

使用应用函数按组将列互换到数据框中。

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2