使用id变量合并2个数据帧

问题描述 投票:0回答:2

我有2个要合并的数据框。我的问题是,在df1中有更多的观察结果。因此,线索是白天变量。但是,当我这样做时,我会收到多次重复的观察。

df1

df2

df2

index   id1 id2  day
    12    1   1   Monday
    12    1   2   Sunday
    123   1   1   Tuesday
    123   1   2   Sunday
    123   2   1   Monday
    123   2   2   Friday
     10   1   1   Wednesday
     10   1   2   Saturday

结果

index    id1   day
    12     1    Monday
    123    2    Monday
    10     1    Wednesday

样本数据

df1:

 index   id1 id2 day
    12      1   1   Monday
    123     2   1   Monday 
    10      1   1   Wednesday

df2:

    structure(list(index = c(11011202, 11011202, 11011202, 11011202, 
11011203, 11011203, 11011207, 11011207, 11011207, 11011207, 11011209, 
11011209, 11011209, 11011209, 11011210, 11011210, 11011210, 11011210, 
11011211, 11011211, 11011211, 11011211, 11011212, 11011212, 11011212, 
11011212, 11011212, 11011212, 11011212, 11011212, 11011213, 11011213, 
11011213, 11011213, 11011213, 11011213, 11011217, 11011217, 11011219, 
11011219, 11011220, 11011220, 11011220, 11011220, 11011220, 11011220, 
11020202, 11020202, 11020202, 11020202), id1 = c(1, 1, 4, 4, 
1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 
2, 3, 3, 4, 4, 1, 1, 3, 3, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 
1, 1, 2, 2), id2 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Day = c(5, 1, 5, 
1, 1, 3, 4, 7, 4, 7, 4, 1, 4, 1, 5, 7, 5, 7, 1, 2, 1, 2, 7, 2, 
7, 2, 7, 2, 7, 2, 7, 4, 7, 4, 7, 4, 4, 1, 3, 1, 1, 2, 1, 2, 1, 
2, 4, 7, 4, 7)), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))
r dataframe
2个回答
1
投票

这里是使用基数R在structure(list(Day = c(3, 3, 4, 6, 6, 6, 7, 7, 7, 7, 4, 4, 6, 6, 6, 4, 3, 7, 7, 5, 5, 7, 5, 6, 6, 7, 2, 6, 7, 4, 6, 6, 4, 4, 3, 4, 3, 3, 5, 6, 5, 5, 5, 7, 7, 6, 4, 7, 7, 7), index = c(11011209, 11011209, 11011210, 11011212, 11011212, 11011213, 11011213, 11011220, 11011220, 11020208, 11020212, 11020212, 11020301, 11020301, 11020301, 11020305, 11020310, 11020315, 11020315, 11020316, 11020316, 11020320, 11020606, 11020611, 11020611, 11020613, 11020617, 11031116, 11040814, 11050115, 11050508, 11050508, 11050510, 11050510, 11050511, 11050518, 11050519, 11050519, 11050520, 11051001, 11051002, 11051002, 11051002, 11051004, 11051004, 11051006, 11051007, 11051011, 11051011, 11051017 ), id1 = c(1, 2, 2, 1, 2, 1, 4, 1, 2, 2, 1, 2, 1, 2, 3, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 3, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 2, 3, 1, 2, 2, 1, 1, 2, 1)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame")) df1上执行内部联接的方法。

df2

0
投票

另一个选项是merge(df1,df2,by = c("Day", "index")) # Day index id1.x id2 id1.y #1 Monday 12 1 1 1 #2 Monday 123 2 1 2 中的inner_join

dplyr
© www.soinside.com 2019 - 2024. All rights reserved.