我正在尝试使用 mutate from column 创建新变量。我正在使用 strsplit 执行一些字符串操作来创建新的变量内容。但它没有按预期工作。
myDF = data.frame(filename = c("T9719178_Mazda_20230415",
"T9719179_Mazda_20230415",
"T9719180_Mazda_20230415",
"T9719001_Tesla_20230415",
"T9719002_Tesla_20230415",
"T9719003_Tesla_20230415"))
myDF%>%
mutate(LotCode = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][1]))%>%
mutate(ModelName = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][2]))
Current output
filename LotCode ModelName
T9719178_Mazda_20230415 T9719178 Mazda
T9719179_Mazda_20230415 T9719178 Mazda
T9719180_Mazda_20230415 T9719178 Mazda
T9719001_Tesla_20230415 T9719178 Mazda
T9719002_Tesla_20230415 T9719178 Mazda
T9719003_Tesla_20230415 T9719178 Mazda
Expected output
filename LotCode ModelName
T9719178_Mazda_20230415 T9719178 Mazda
T9719179_Mazda_20230415 T9719179 Mazda
T9719180_Mazda_20230415 T9719180 Mazda
T9719001_Tesla_20230415 T9719001 Tesla
T9719002_Tesla_20230415 T9719002 Tesla
T9719003_Tesla_20230415 T9719003 Tesla
我想上面应该是直截了当的,但不确定哪里出了问题。有什么建议吗?
你可以这样做:
library(tidyverse)
myDF %>%
separate(filename, into = c('LotCode', 'ModelName'), sep = '_', remove = FALSE)
给出:
filename LotCode ModelName
1 T9719178_Mazda_20230415 T9719178 Mazda
2 T9719179_Mazda_20230415 T9719179 Mazda
3 T9719180_Mazda_20230415 T9719180 Mazda
4 T9719001_Tesla_20230415 T9719001 Tesla
5 T9719002_Tesla_20230415 T9719002 Tesla
6 T9719003_Tesla_20230415 T9719003 Tesla
问题是,对于数据集中的每一行,您都在提取
strsplit()
返回的列表的相同(即第一个)元素。
要纠正你的代码,你可以使用
map()
(或apply()
)从列表的每个元素中提取第一个和第二个元素:
library(tidyverse)
myDF %>%
mutate(LotCode = map(strsplit(filename, "_", fixed = TRUE), 1)) %>%
mutate(ModelName = map(strsplit(filename, "_", fixed = TRUE), 2))
#> filename LotCode ModelName
#> 1 T9719178_Mazda_20230415 T9719178 Mazda
#> 2 T9719179_Mazda_20230415 T9719179 Mazda
#> 3 T9719180_Mazda_20230415 T9719180 Mazda
#> 4 T9719001_Tesla_20230415 T9719001 Tesla
#> 5 T9719002_Tesla_20230415 T9719002 Tesla
#> 6 T9719003_Tesla_20230415 T9719003 Tesla
创建于 2023-04-15 与 reprex v2.0.2
或者,您可以使用
sep_wider_delim()
代替,它的优点是返回向量而不是列表:
library(tidyverse)
myDF |>
separate_wider_delim(
filename,
delim = "_",
names = c("filename", "LotCode", "ModelName"),
cols_remove = FALSE
)
#> # A tibble: 6 × 3
#> filename LotCode ModelName
#> <chr> <chr> <chr>
#> 1 T9719178_Mazda_20230415 Mazda 20230415
#> 2 T9719179_Mazda_20230415 Mazda 20230415
#> 3 T9719180_Mazda_20230415 Mazda 20230415
#> 4 T9719001_Tesla_20230415 Tesla 20230415
#> 5 T9719002_Tesla_20230415 Tesla 20230415
#> 6 T9719003_Tesla_20230415 Tesla 20230415
创建于 2023-04-15 与 reprex v2.0.2
另一个解决方案是
extract
来自tidyr
:
myDF %>%
extract(filename,
into = c("LotCode", "ModelName"),
regex = "(.*)_(.*)_.*", remove = FALSE)
filename LotCode ModelName
1 T9719178_Mazda_20230415 T9719178 Mazda
2 T9719179_Mazda_20230415 T9719179 Mazda
3 T9719180_Mazda_20230415 T9719180 Mazda
4 T9719001_Tesla_20230415 T9719001 Tesla
5 T9719002_Tesla_20230415 T9719002 Tesla
6 T9719003_Tesla_20230415 T9719003 Tesla