具有具有欧洲数字格式样式(1234.56-> 1.234,56)的csv文件,应使用readr
函数或fread()
处理。尽管read_csv2()
应该专门为该任务而设计,但它基本上忽略了该规范。它只会猜测数字会自动格式化。如果前四个数字以上的数字仅出现在文件的末尾,即到达guess_max
之后(默认为1000),则会出现问题。
我如何以编程方式强制执行正确的格式?
library(readr)
data <- data.frame(var1 = c("", 4, 5, "124.392,45"),
var2 = c(1, 2, "4.783.194,43", 7))
write_csv2(data, "data.csv")
read_csv2("data.csv", guess_max = 2,
locale = locale(decimal_mark = ",", grouping_mark = "."))
# # A tibble: 4 x 2
# var1 var2
# <dbl> <dbl>
# 1 NA 1
# 2 4 2
# 3 5 NA
# 4 NA 7
read_csv2("data.csv", guess_max = 3,
locale = locale(decimal_mark = ",", grouping_mark = "."))
# # A tibble: 4 x 2
# var1 var2
# <dbl> <dbl>
# 1 NA 1
# 2 4 2
# 3 5 4783194.
# 4 NA 7
read_delim("data.csv", delim = ";", guess_max = 3,
locale = locale(decimal_mark = ",", grouping_mark = "."))
# # A tibble: 4 x 2
# var1 var2
# <dbl> <dbl>
# 1 NA 1
# 2 4 2
# 3 5 4783194.
# 4 NA 7
预先设置col_types
似乎有帮助。在这种情况下为数字。
result <- read_csv2("data.csv",
guess_max = 2,
col_types = "nn",
locale = locale(decimal_mark = ",", grouping_mark = "."))
result
# A tibble: 4 x 2
var1 var2
<dbl> <dbl>
1 NA 1
2 4 2
3 5 4783194.
4 124392. 7