如何正确转义 CSV 中的双引号?

问题描述 投票:0回答:6

我的 CSV 中有这样一行:

"Samsung U600 24"","10000003409","1","10000003427"

24

 旁边的引号用于表示英寸,而该引号旁边的引号则关闭该字段。我正在读取带有 
fgetcsv
 的行,但解析器犯了一个错误,并将值读取为:

Samsung U600 24",10000003409"



我尝试在英寸引号前加一个反斜杠,但后来我在名称中只得到一个反斜杠:

Samsung U600 24\"



有没有办法在 CSV 中正确转义此值,以便该值为

Samsung U600 24"

 ,或者我是否必须在处理器中对其进行正则表达式?

php csv escaping fgetcsv
6个回答
494
投票
使用 2 个引号:

"Samsung U600 24"""
    

9
投票
不仅需要双引号,您还需要单引号 (

'

)、双引号 (
"
)、反斜杠 (
\
) 和 NUL(NULL 字节)。

使用

fputcsv()

来书写,用fgetcsv()
来阅读,一切都搞定了。


4
投票
我是用Java写的。

public class CSVUtil { public static String addQuote( String pValue) { if (pValue == null) { return null; } else { if (pValue.contains("\"")) { pValue = pValue.replace("\"", "\"\""); } if (pValue.contains(",") || pValue.contains("\n") || pValue.contains("'") || pValue.contains("\\") || pValue.contains("\"")) { return "\"" + pValue + "\""; } } return pValue; } public static void main(String[] args) { System.out.println("ab\nc" + "|||" + CSVUtil.addQuote("ab\nc")); System.out.println("a,bc" + "|||" + CSVUtil.addQuote("a,bc")); System.out.println("a,\"bc" + "|||" + CSVUtil.addQuote("a,\"bc")); System.out.println("a,\"\"bc" + "|||" + CSVUtil.addQuote("a,\"\"bc")); System.out.println("\"a,\"\"bc\"" + "|||" + CSVUtil.addQuote("\"a,\"\"bc\"")); System.out.println("\"a,\"\"bc" + "|||" + CSVUtil.addQuote("\"a,\"\"bc")); System.out.println("a,\"\"bc\"" + "|||" + CSVUtil.addQuote("a,\"\"bc\"")); } }
    

1
投票
CSV 理论上是一种简单的格式(用逗号分隔的表格数据),但遗憾的是没有正式的规范,因此有许多细微不同的实现。导入/导出时需要小心。我将引用 RFC 4180 来实现

常见的实现

2. Definition of the CSV Format While there are various specifications and implementations for the CSV format (for ex. [4], [5], [6] and [7]), there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files. This section documents the format that seems to be followed by most implementations: 1. Each record is located on a separate line, delimited by a line break (CRLF). For example: aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 2. The last record in the file may or may not have an ending line break. For example: aaa,bbb,ccc CRLF zzz,yyy,xxx 3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional "header" parameter of this MIME type). For example: field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma. For example: aaa,bbb,ccc 5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example: "aaa","bbb","ccc" CRLF zzz,yyy,xxx 6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example: "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"
所以

通常

    字段可以用双引号括起来,也可以不用双引号括起来。 (2005 年的 RFC 说 Excel 不使用双引号,但我用 Excel 2016 进行测试,结果确实如此。)
  • 包含换行符 (CRLF)、双引号和逗号的字段应括在双引号中。 (特别是,CSV 文件可能有多行,因为它们出现在文本编辑器中,对应于一行数据。)
  • 如果使用双引号括住字段,则必须通过在字段前添加另一个双引号来转义出现在字段内的双引号
    • 因此,原始 CSV 字段中的
    • ""
       代表空字符串,原始 CSV 中的 
      """"
       代表单引号,
      "
(通常不是问题:CRLF(Windows 风格)或 LF(Unix 风格)换行符;最后一行是否以换行符结束)

但是,您可能会遇到使用

\

 等转义字符转义引号或其他字符(分隔符、换行符、转义字符本身)的数据。例如,在readr的
read_csv()
中,这是由
escape_double
escape_backslash
控制的。一些不寻常的数据使用像 
#
 这样的注释字符(R 中默认为 
read.table
,但不是 
read.csv
)。


-1
投票
由于没有人提到我通常的做法,所以我就把它写下来。当有一个棘手的字符串时,我什至懒得逃避它。

我所做的只是

base64_encode

base64_decode
,即在写入CSV行之前将值编码为Base64,当我想读取它时,解码。

对于您的示例,假设它是 PHP:

$csvLine = [base64_encode('Samsung U600 24"'),"10000003409","1","10000003427"];
当我想获取该值时,我会做相反的事情。

$value = base64_decode($csvLine[0])
我只是不喜欢经历痛苦。


-2
投票
我知道这是一篇旧文章,但以下是我使用扩展方法在 C# 中解决该问题的方法(以及将 null 值转换为空字符串)。

创建一个静态类,如下所示:

/// <summary> /// Wraps value in quotes if necessary and converts nulls to empty string /// </summary> /// <param name="value"></param> /// <returns>String ready for use in CSV output</returns> public static string Q(this string value) { if (value == null) { return string.Empty; } if (value.Contains(",") || (value.Contains("\"") || value.Contains("'") || value.Contains("\\")) { return "\"" + value + "\""; } return value; }
然后对于写入 CSV 的每个字符串,而不是:

stringBuilder.Append( WhateverVariable );
你只需要做:

stringBuilder.Append( WhateverVariable.Q() );
    
© www.soinside.com 2019 - 2024. All rights reserved.