我有两个数据集(每个人口一个:sellers
vs buyers
)。它们以相同的方式构建。
FOR BUYERS (TYPE 2)
period subject genderb gp matchp treatment type p1 p2 suminte partner
1 1 0 2 48 404 2 7 8 NA 4
1 3 1 2 48 404 2 7 8 NA 4
...
FOR SELLERS (TYPE 1)
period subject genders gp matchgp treatment type p1 p2 suminte partner
1 4 1 2 48 404 1 7 8 2 NA
...
然而,在sellers
数据中观察较少,因为一个卖家可以与一个period
中的许多买家匹配(这里,卖家与2个买家互动)。在buyers
数据中,合作伙伴表示subject id
(卖方的列主题),而在seller
数据中,suminte
表示卖方与之交互的买方数量。
我想做的是:在数据集buyers
中,为每一行添加列genders
(意思是卖方的性别),并将其与正确的买方匹配,在右侧period
,右侧匹配组,右侧匹配组价格...
我想要的结果如下:
FOR BUYERS (TYPE 2)
period subject genderb genders gp matchp treatment type p1 p2 suminte partner
1 1 0 1 2 48 404 2 7 8 NA 4
1 3 1 1 2 48 404 2 7 8 NA 4
...
如果我不够清楚,请告诉我
# example data
df1 = read.table(text = "
period subject genderb gp matchgp treatment type p1 p2 suminte partner
1 1 0 2 48 404 2 7 8 NA 4
1 3 1 2 48 404 2 7 8 NA 4
", header=T, stringsAsFactors=F)
df2 = read.table(text = "
period subject genders gp matchgp treatment type p1 p2 suminte partner
1 4 1 2 48 404 1 7 8 2 NA
", header=T, stringsAsFactors=F)
library(dplyr)
# remove columns that exist in df1 and you won't join on them
df2 = df2 %>% select(-treatment, -type, -suminte, -partner)
# join datasets using appropriate columns
left_join(df1, df2, by=c("period","gp","matchgp","p1","p2", "partner"="subject"))
# period subject genderb gp matchgp treatment type p1 p2 suminte partner genders
# 1 1 1 0 2 48 404 2 7 8 NA 4 1
# 2 1 3 1 2 48 404 2 7 8 NA 4 1