R-多年以来保持不变的过滤器观察结果

问题描述 投票:0回答:1

我已经准备了一个示例数据表:

testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2), 
                        policy = c(rep("A", 9), rep("B",9)),
                        destination = rep(c("Paris", "London", "Berlin"), 6))

testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]

> testTable
    years policy destination
 1:  2014      A      Moskaw
 2:  2014      A      London
 3:  2014      A      Berlin
 4:  2015      A       Paris
 5:  2015      A      Milano
 6:  2015      A      Berlin
 7:  2016      A       Paris
 8:  2016      A    Valencia
 9:  2016      A      Berlin
10:  2014      B       Paris
11:  2014      B      London
12:  2014      B      Berlin
13:  2015      B       Paris
14:  2015      B      London
15:  2015      B      Berlin
16:  2016      B       Paris
17:  2016      B      London
18:  2016      B      Berlin

[这里,我只想保留数据中所有年份中具有相同destination的观测值。在此示例中,我选择的策略只有3年,但实际数据也可能在单个data.table中混合了2、3和4年的历史。

期望的结果是:

> testTable
    years policy destination
 3:  2014      A      Berlin
 6:  2015      A      Berlin
 9:  2016      A      Berlin
10:  2014      B       Paris
11:  2014      B      London
12:  2014      B      Berlin
13:  2015      B       Paris
14:  2015      B      London
15:  2015      B      Berlin
16:  2016      B       Paris
17:  2016      B      London
18:  2016      B      Berlin

Any ides?

我尝试使用dcast(),然后我想过滤在policy之后所有列中具有相同条目的行,但是我意识到dcast()自动将我的字符变量destination转换为数字并汇总了我的字符使用长度的数据:

Aggregate function missing, defaulting to 'length'

注意:我的数据将包含数百个观测值。

r data.table character lag dcast
1个回答
0
投票

使用data.table,您可以做:

library(data.table)

testTable[testTable[, destination %in% 
                      Reduce(intersect, split(destination, years)), policy]$V1]


#    years policy destination
# 1:  2014      A      Berlin
# 2:  2015      A      Berlin
# 3:  2016      A      Berlin
# 4:  2014      B       Paris
# 5:  2014      B      London
# 6:  2014      B      Berlin
# 7:  2015      B       Paris
# 8:  2015      B      London
# 9:  2015      B      Berlin
#10:  2016      B       Paris
#11:  2016      B      London
#12:  2016      B      Berlin

并且在dplyr中:

library(dplyr)

testTable %>%
  group_by(policy) %>%
  filter(destination %in% Reduce(intersect, split(destination, years)))
© www.soinside.com 2019 - 2024. All rights reserved.