在R中使用write.table
或write.csv
时,默认情况下,在所有非数字字段周围都添加双引号,无论是否正确解析csv文件实际上都需要使用引号。
以Python脚本为例:
import csv
f_out=open("pytest.csv", "w")
wri = csv.writer(f_out, delimiter=',')
wri.writerow(['c_numeric', 'c_str', 'c_str_spec'])
wri.writerow([11, "r1c2", "r1c3 nothing special"])
wri.writerow([21, "r2c2", "r2c3,with delim"])
wri.writerow([31, "r3c2", "r3c3\nwith carriage return"])
wri.writerow([41, "r4c2", "r3c3\"with double quote"])
f_out.close()
这会将以下内容输出到pytest.csv
:
c_numeric,c_str,c_str_spec
11,r1c2,r1c3 nothing special
21,r2c2,"r2c3,with delim"
31,r3c2,"r3c3
with carriage return"
41,r4c2,"r3c3""with double quote"
这是我期望的,并遵循Excel也会输出的内容。
现在让我们使用R处理此文件,并使用带引号和不带引号的方式写:
df <- read.csv("pytest.csv")
write.csv(df, 'Rtest.csv', row.names=FALSE)
write.csv(df, 'Rtest_NQ.csv', row.names=FALSE, quote=FALSE)
这里是Rtest.csv
:
"c_numeric","c_str","c_str_spec"
11,"r1c2","r1c3 nothing special"
21,"r2c2","r2c3,with delim"
31,"r3c2","r3c3
with carriage return"
41,"r4c2","r3c3""with double quote"
注意all非数字字段周围的引号。
这里是Rtest_NQ.csv
:
c_numeric,c_str,c_str_spec
11,r1c2,r1c3 nothing special
21,r2c2,r2c3,with delim
31,r3c2,r3c3
with carriage return
41,r4c2,r3c3"with double quote
此文件在技术上已损坏,因为任何csv读取器都无法读取,因此不是一个好的选择。
[我的问题:R中是否有与rfc4180兼容的编写器,其编写方式类似于Excel或python csv库以及大多数其他与rfc4180兼容的工具?
您可以编写一个简单的函数来构造csv,方法是将数据帧转换为字符矩阵,转义任何双引号,然后引用任何包含逗号或换行符的字符串。然后,您添加列名并使用writeLines
write_unquoted <- function(df, path)
{
x <- as.matrix(df)
x[grep("\"", x)] <- paste0("\"", gsub("\"", "\"\"", x[grep("\"", x)]), "\"")
x[grep(",|\n", x)] <- paste0("\"", x[grep(",|\n", x)], "\"")
x <- c(paste0(colnames(x), collapse = ","), apply(x, 1, paste0, collapse = ","))
writeLines(x, path)
}
所以,如果我们从您的示例开始:
df
#> c_numeric c_str c_str_spec
#> 1 11 r1c2 r1c3 nothing special
#> 2 21 r2c2 r2c3,with delim
#> 3 31 r3c2 r3c3\nwith carriage return
#> 4 41 r4c2 r3c3"with double quote
我们做
write_unquoted(df, "my.csv")
我们可以看到它忠实地存储了数据帧:
identical(read.csv("my.csv"), df)
#> [1] TRUE
并且,如果我们查看生成的csv,它看起来像这样:
c_numeric,c_str,c_str_spec
11,r1c2,r1c3 nothing special
21,r2c2,"r2c3,with delim"
31,r3c2,"r3c3
with carriage return"
41,r4c2,"r3c3""with double quote"
即,仅在需要时引用。