使用 strsplit 与 mutate R 创建新变量的问题

问题描述 投票:0回答:3

我正在尝试使用 mutate from column 创建新变量。我正在使用 strsplit 执行一些字符串操作来创建新的变量内容。但它没有按预期工作。

myDF = data.frame(filename = c("T9719178_Mazda_20230415",
                           "T9719179_Mazda_20230415", 
                           "T9719180_Mazda_20230415", 
                           "T9719001_Tesla_20230415", 
                           "T9719002_Tesla_20230415", 
                           "T9719003_Tesla_20230415"))

myDF%>%
mutate(LotCode = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][1]))%>%
mutate(ModelName = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][2]))

Current output 
                    filename                LotCode     ModelName
                    T9719178_Mazda_20230415 T9719178     Mazda
                    T9719179_Mazda_20230415 T9719178     Mazda
                    T9719180_Mazda_20230415 T9719178     Mazda
                    T9719001_Tesla_20230415 T9719178     Mazda
                    T9719002_Tesla_20230415 T9719178     Mazda
                    T9719003_Tesla_20230415 T9719178     Mazda

 Expected output 

             filename       LotCode  ModelName
  T9719178_Mazda_20230415 T9719178     Mazda
  T9719179_Mazda_20230415 T9719179     Mazda
  T9719180_Mazda_20230415 T9719180     Mazda
  T9719001_Tesla_20230415 T9719001     Tesla
  T9719002_Tesla_20230415 T9719002     Tesla
  T9719003_Tesla_20230415 T9719003     Tesla

我想上面应该是直截了当的,但不确定哪里出了问题。有什么建议吗?

r dplyr strsplit mutate
3个回答
1
投票

你可以这样做:

library(tidyverse)
myDF %>%
  separate(filename, into = c('LotCode', 'ModelName'), sep = '_', remove = FALSE)

给出:

                filename  LotCode ModelName
1 T9719178_Mazda_20230415 T9719178     Mazda
2 T9719179_Mazda_20230415 T9719179     Mazda
3 T9719180_Mazda_20230415 T9719180     Mazda
4 T9719001_Tesla_20230415 T9719001     Tesla
5 T9719002_Tesla_20230415 T9719002     Tesla
6 T9719003_Tesla_20230415 T9719003     Tesla

1
投票

问题是,对于数据集中的每一行,您都在提取

strsplit()
返回的列表的相同(即第一个)元素。

要纠正你的代码,你可以使用

map()
(或
apply()
)从列表的每个元素中提取第一个和第二个元素:

library(tidyverse)

myDF %>%
  mutate(LotCode = map(strsplit(filename, "_", fixed = TRUE), 1)) %>%
  mutate(ModelName = map(strsplit(filename, "_", fixed = TRUE), 2))
#>                  filename  LotCode ModelName
#> 1 T9719178_Mazda_20230415 T9719178     Mazda
#> 2 T9719179_Mazda_20230415 T9719179     Mazda
#> 3 T9719180_Mazda_20230415 T9719180     Mazda
#> 4 T9719001_Tesla_20230415 T9719001     Tesla
#> 5 T9719002_Tesla_20230415 T9719002     Tesla
#> 6 T9719003_Tesla_20230415 T9719003     Tesla

创建于 2023-04-15 与 reprex v2.0.2

或者,您可以使用

sep_wider_delim()
代替,它的优点是返回向量而不是列表:

library(tidyverse)

myDF |> 
  separate_wider_delim(
    filename,
    delim = "_",
    names = c("filename", "LotCode", "ModelName"),
    cols_remove = FALSE
  )
#> # A tibble: 6 × 3
#>   filename                LotCode ModelName
#>   <chr>                   <chr>   <chr>    
#> 1 T9719178_Mazda_20230415 Mazda   20230415 
#> 2 T9719179_Mazda_20230415 Mazda   20230415 
#> 3 T9719180_Mazda_20230415 Mazda   20230415 
#> 4 T9719001_Tesla_20230415 Tesla   20230415 
#> 5 T9719002_Tesla_20230415 Tesla   20230415 
#> 6 T9719003_Tesla_20230415 Tesla   20230415

创建于 2023-04-15 与 reprex v2.0.2


0
投票

另一个解决方案是

extract
来自
tidyr

myDF %>%
   extract(filename,
           into = c("LotCode", "ModelName"),
           regex = "(.*)_(.*)_.*", remove = FALSE)
                 filename  LotCode ModelName
1 T9719178_Mazda_20230415 T9719178     Mazda
2 T9719179_Mazda_20230415 T9719179     Mazda
3 T9719180_Mazda_20230415 T9719180     Mazda
4 T9719001_Tesla_20230415 T9719001     Tesla
5 T9719002_Tesla_20230415 T9719002     Tesla
6 T9719003_Tesla_20230415 T9719003     Tesla
© www.soinside.com 2019 - 2024. All rights reserved.