我有两个大数据帧(df有7038行,df2有14076行)。我想比较它们并在某些字段相同时添加值。
我尝试了一个带有if语句的嵌套for循环,但需要几个小时才能完成。
DF:
Date HomeTeam AwayTeam FTR GoalScoreHome GoalScoreAway
<date> <chr> <chr> <chr> <chr> <chr>
1 1995-08-18 For Sittard PSV Eindhoven A NA NA
2 1995-08-19 Go Ahead Eagles Groningen D NA NA
3 1995-08-19 Roda JC Heerenveen D NA NA
4 1995-08-19 Willem II Sparta H NA NA
5 1995-08-20 Ajax Utrecht H NA NA
6 1995-08-20 Feyenoord Vitesse H NA NA
7 1995-08-20 Graafschap Nijmegen A NA NA
8 1995-08-20 Volendam Twente A NA NA
9 1995-08-20 Waalwijk NAC Breda D NA NA
10 1995-08-23 Groningen For Sittard H NA NA
DF2:
Round Date Team GDPerGame PointsPerGame GoalScore5.2
1 1 1995-08-20 Ajax 4 3 NA
2 2 1995-08-25 Ajax 6 3 NA
3 3 1995-09-10 Ajax 4 3 NA
4 4 1995-09-17 Ajax 4 3 NA
5 5 1995-09-20 Ajax 4 3 NA
6 6 1995-09-24 Ajax 1 3 22
我正在使用以下循环:
for (i in 1:nrow(df)) {
for (j in 1:nrow(df2)) {
if(df$HomeTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j] ){
df$GoalScoreHome[i] = df2$GoalScore5.2[j]
}
else if(df$AwayTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j]){
df$GoalScoreAway[i] = df2$GoalScore5.2[j]
}
}
}
这按预期工作,但正如我之前说的那样,它太慢了
我找到了嵌套循环的一些替代方法,但没有使用if语句。有谁知道一个好的,更快的选择?