NA 使用 matchmaker: match_df in R

问题描述 投票:0回答:1

我一直在使用 matchmaker:match_df 包中的清理字典工具。

代码如下:

dat <-import("coded-data.csv")
dict <- import("dict.csv")
               
df <- match_df(dat,
                 dictionary = dict,
                 from = "options",
                 to = "values",
                 by = "grp")

但是,我最近更换了计算机,现在当我运行之前有效的相同代码时,我在所有变量上都收到以下错误:

“NA

...
的每个元素都必须是命名字符串。”

我不确定这意味着什么或如何纠正它。

我的所有变量都是数据框和清理字典中的字符。

r string data-cleaning
1个回答
0
投票

这个问题在https://stackoverflow.com/a/78228141/2752888中得到了回答,并记录在此处:https://cran.r-project.org/web/packages/matchmaker/vignettes/intro.html#values -到列

对于更多上下文和可重现的示例,如果您的字典的值列中包含空白单元格,则可能会发生这种情况:

df <- data.frame(var1 = c("aaa", "miss", "bbb"))
print(df)
#>   var1
#> 1  aaa
#> 2 miss
#> 3  bbb

# Dictionary has a blank cell in the second row of the second column ------
dict <- data.frame(
  from = c("aaa", "miss", "bbb"),
  to = c("AAA", "", "bbb"), 
  col = rep("var1", 3)
)
print(dict)
#>   from  to  col
#> 1  aaa AAA var1
#> 2 miss     var1
#> 3  bbb bbb var1
matchmaker::match_df(df, dict, 
  from = "from", 
  to = "to", 
  by = "col", 
  warn = TRUE
)
#> 
#> ── Errors were found in the following columns ──
#> 
#> • var1
#>   1. NA Each element of `...` must be a named string.
#>   var1
#> 1  aaa
#> 2 miss
#> 3  bbb

# Replacing the blank cell with the ".na" keyword fixes this. --------------
dict$to[2] <- ".na"
print(dict)
#>   from  to  col
#> 1  aaa AAA var1
#> 2 miss .na var1
#> 3  bbb bbb var1
print(dict)
#>   from  to  col
#> 1  aaa AAA var1
#> 2 miss .na var1
#> 3  bbb bbb var1
matchmaker::match_df(df, dict, 
  from = "from", 
  to = "to", 
  by = "col", 
  warn = TRUE
)
#>   var1
#> 1  AAA
#> 2 <NA>
#> 3  bbb

创建于 2024-04-01,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.