我有2个要合并的数据框。我的问题是,在df1
中有更多的观察结果。因此,线索是白天变量。但是,当我这样做时,我会收到多次重复的观察。
df1
df2
df2
index id1 id2 day
12 1 1 Monday
12 1 2 Sunday
123 1 1 Tuesday
123 1 2 Sunday
123 2 1 Monday
123 2 2 Friday
10 1 1 Wednesday
10 1 2 Saturday
结果
index id1 day
12 1 Monday
123 2 Monday
10 1 Wednesday
样本数据
df1:
index id1 id2 day
12 1 1 Monday
123 2 1 Monday
10 1 1 Wednesday
df2:
structure(list(index = c(11011202, 11011202, 11011202, 11011202,
11011203, 11011203, 11011207, 11011207, 11011207, 11011207, 11011209,
11011209, 11011209, 11011209, 11011210, 11011210, 11011210, 11011210,
11011211, 11011211, 11011211, 11011211, 11011212, 11011212, 11011212,
11011212, 11011212, 11011212, 11011212, 11011212, 11011213, 11011213,
11011213, 11011213, 11011213, 11011213, 11011217, 11011217, 11011219,
11011219, 11011220, 11011220, 11011220, 11011220, 11011220, 11011220,
11020202, 11020202, 11020202, 11020202), id1 = c(1, 1, 4, 4,
1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2,
2, 3, 3, 4, 4, 1, 1, 3, 3, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3,
1, 1, 2, 2), id2 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Day = c(5, 1, 5,
1, 1, 3, 4, 7, 4, 7, 4, 1, 4, 1, 5, 7, 5, 7, 1, 2, 1, 2, 7, 2,
7, 2, 7, 2, 7, 2, 7, 4, 7, 4, 7, 4, 4, 1, 3, 1, 1, 2, 1, 2, 1,
2, 4, 7, 4, 7)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
这里是使用基数R在structure(list(Day = c(3, 3, 4, 6, 6, 6, 7, 7, 7, 7, 4, 4, 6,
6, 6, 4, 3, 7, 7, 5, 5, 7, 5, 6, 6, 7, 2, 6, 7, 4, 6, 6, 4, 4,
3, 4, 3, 3, 5, 6, 5, 5, 5, 7, 7, 6, 4, 7, 7, 7), index = c(11011209,
11011209, 11011210, 11011212, 11011212, 11011213, 11011213, 11011220,
11011220, 11020208, 11020212, 11020212, 11020301, 11020301, 11020301,
11020305, 11020310, 11020315, 11020315, 11020316, 11020316, 11020320,
11020606, 11020611, 11020611, 11020613, 11020617, 11031116, 11040814,
11050115, 11050508, 11050508, 11050510, 11050510, 11050511, 11050518,
11050519, 11050519, 11050520, 11051001, 11051002, 11051002, 11051002,
11051004, 11051004, 11051006, 11051007, 11051011, 11051011, 11051017
), id1 = c(1, 2, 2, 1, 2, 1, 4, 1, 2, 2, 1, 2, 1, 2, 3, 1, 1,
1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 3, 1, 2, 1, 2, 1, 1, 1, 2,
2, 1, 1, 2, 3, 1, 2, 2, 1, 1, 2, 1)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
和df1
上执行内部联接的方法。
df2
另一个选项是merge(df1,df2,by = c("Day", "index"))
# Day index id1.x id2 id1.y
#1 Monday 12 1 1 1
#2 Monday 123 2 1 2
中的inner_join
dplyr