我正在使用相当大的数据集(10万行),并且想要在R Studio中复制Excel索引匹配功能。
我正在寻找一种创建新列的方法,该方法将从一个现有列(“ 1995_Number”)中提取一个值,如果一年中三个不同列中的三个值与另一年中三个列中的三个值相匹配-< [独立于行,并创建一个新列(“ 1994_Number”)。
以数据框为例:dat <- data.frame(`1994_Address` = c("1234 Road", "123 Road", "321 Road"),
`1994_ZipCode` = c(99999, 99999, 11111),
`1994_Bank Name` = c("JPM", "JPM", "WF"),
`1995_Address` = c("123 Road", "1234 Road", "321 Road"),
`1995_ZipCode` = c(99999, 99999, 11111),
`1995_Bank Name` = c("JPM", "JPM", "WF"),
`1995_Number` = c(1, 2, 3), check.names = F, stringsAsFactors = F)
新创建的列1994_Number应该显示(2,1,3)
match
中的base
函数。与dplyr
一起完成以下工作:library(dplyr)
dat <- data.frame(`1994_Adress` = c("1234 Road", "123 Road", "321 Road"),
`1994_ZipCode` = c(99999, 99999, 11111),
`1994_Bank Name` = c("JPM", "JPM", "WF"),
`1995_Adress` = c("123 Road", "1234 Road", "321 Road"),
`1995_ZipCode` = c(99999, 99999, 11111),
`1995_Bank Name` = c("JPM", "JPM", "WF"),
`1995_Number` = c(1, 2, 3), check.names = F, stringsAsFactors = F)
dat %>%
mutate(`1994_Number` = ifelse(`1994_Adress` %in% `1995_Adress` &
`1994_ZipCode` %in% `1995_ZipCode` &
`1994_Bank Name` %in% `1995_Bank Name`,
dat[match(dat$`1994_Adress`, dat$`1995_Adress`), "1995_Number"], NA))
# 1994_Adress 1994_ZipCode 1994_Bank Name 1995_Adress 1995_ZipCode 1995_Bank Name 1995_Number 1994_Number
# 1 1234 Road 99999 JPM 123 Road 99999 JPM 1 2
# 2 123 Road 99999 JPM 1234 Road 99999 JPM 2 1
# 3 321 Road 11111 WF 321 Road 11111 WF 3 3