使用 R 中 readr 包中的 read_csv 导入不带引号的字符串作为因子

Question

我有一个包含许多列的 .csv 数据文件。不幸的是，字符串值没有引号（即 apples i.o. "apples）。当我使用 readr 包中的 read_csv 时，字符串值将作为字符导入：

library(readr)

mydat = data.frame(first = letters, numbers = 1:26, second = sample(letters, 26))
write.csv(mydat, "mydat.csv", quote = FALSE, row.names = FALSE)

read_csv("mydat.csv")

结果：

Parsed with column specification:
cols(
  first = col_character(),
  numbers = col_integer(),
  second = col_character()
)
# A tibble: 26 x 3
   first numbers second
   <chr>   <int>  <chr>
1      a       1      r
2      b       2      n
3      c       3      m
4      d       4      z
5      e       5      p
6      f       6      j
7      g       7      u
8      h       8      l
9      i       9      e
    10     j      10      h
    # ... with 16 more rows

有没有办法强制 read_csv 将字符串值作为因素导入。角色？

重要的是，我的数据文件有很多列（字符串和数字变量），据我所知，无法通过使用 col_types 参数提供列规范来完成这项工作。

替代解决方案（例如使用 read.csv 导入数据，或使用 dplyr 代码将数据框中的所有字符变量更改为因子）也很受欢迎。

更新：我了解到，csv 文件中的值是否带引号对于 read.csv 或 read_csv 没有区别。 read.csv 将导入这些值作为因子； read_csv 会将它们作为字符导入。我更喜欢使用 read_csv，因为它比 read.csv 快得多。

Answer 1

此函数使用 dplyr 将 tbl_df 或数据框中的所有字符列转换为因子：

char.to.factors <- function(df){
  # This function takes a tbl_df and returns same with any character column converted to a factor

  require(dplyr)

  char.cols = names(df)[sapply(df, function(x) {class(x) == "character" })]
  tmp = mutate_each_(df, funs(as.factor), char.cols)
  return(tmp)
}

Answer 2

我喜欢上面评论中的 alistaire 的 mutate_if() 解决方案，但为了完整性，还有另一个解决方案应该提到。您可以使用 unclass() 这将强制重新解析。您会在很多使用 readr 的代码中看到这一点。

df <- data.frame(unclass(fr))

或

df <- df %>% unclass %>% data.frame

Answer 3

不幸的是，

stringsAsFactors = FALSE

中没有

read_csv

的版本，我认为

col_types=

需要特定的列而无需更多技巧。

一个简单的解决方案是使用 `cross

将字符串转换为因子

使用 R 中 readr 包中的 read_csv 导入不带引号的字符串作为因子

问题描述投票：0回答：3

3个回答

最新问题

使用 R 中 readr 包中的 read_csv 导入不带引号的字符串作为因子

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3