我已经准备了一个示例数据表:
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
> testTable
years policy destination
1: 2014 A Moskaw
2: 2014 A London
3: 2014 A Berlin
4: 2015 A Paris
5: 2015 A Milano
6: 2015 A Berlin
7: 2016 A Paris
8: 2016 A Valencia
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
[这里,我只想保留数据中所有年份中具有相同destination
的观测值。在此示例中,我选择的策略只有3年,但实际数据也可能在单个data.table中混合了2、3和4年的历史。
期望的结果是:
> testTable
years policy destination
3: 2014 A Berlin
6: 2015 A Berlin
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
Any ides?
我尝试使用dcast()
,然后我想过滤在policy
之后所有列中具有相同条目的行,但是我意识到dcast()
自动将我的字符变量destination
转换为数字并汇总了我的字符使用长度的数据:
Aggregate function missing, defaulting to 'length'
注意:我的数据将包含数百个观测值。
使用data.table
,您可以做:
library(data.table)
testTable[testTable[, destination %in%
Reduce(intersect, split(destination, years)), policy]$V1]
# years policy destination
# 1: 2014 A Berlin
# 2: 2015 A Berlin
# 3: 2016 A Berlin
# 4: 2014 B Paris
# 5: 2014 B London
# 6: 2014 B Berlin
# 7: 2015 B Paris
# 8: 2015 B London
# 9: 2015 B Berlin
#10: 2016 B Paris
#11: 2016 B London
#12: 2016 B Berlin
并且在dplyr
中:
library(dplyr)
testTable %>%
group_by(policy) %>%
filter(destination %in% Reduce(intersect, split(destination, years)))