我的 CSV 中有这样一行:
"Samsung U600 24"","10000003409","1","10000003427"
24
旁边的引号用于表示英寸,而该引号旁边的引号则关闭该字段。我正在读取带有
fgetcsv
的行,但解析器犯了一个错误,并将值读取为:
Samsung U600 24",10000003409"
Samsung U600 24\"
Samsung U600 24"
,或者我是否必须在处理器中对其进行正则表达式?
"Samsung U600 24"""
public class CSVUtil {
public static String addQuote(
String pValue) {
if (pValue == null) {
return null;
} else {
if (pValue.contains("\"")) {
pValue = pValue.replace("\"", "\"\"");
}
if (pValue.contains(",")
|| pValue.contains("\n")
|| pValue.contains("'")
|| pValue.contains("\\")
|| pValue.contains("\"")) {
return "\"" + pValue + "\"";
}
}
return pValue;
}
public static void main(String[] args) {
System.out.println("ab\nc" + "|||" + CSVUtil.addQuote("ab\nc"));
System.out.println("a,bc" + "|||" + CSVUtil.addQuote("a,bc"));
System.out.println("a,\"bc" + "|||" + CSVUtil.addQuote("a,\"bc"));
System.out.println("a,\"\"bc" + "|||" + CSVUtil.addQuote("a,\"\"bc"));
System.out.println("\"a,\"\"bc\"" + "|||" + CSVUtil.addQuote("\"a,\"\"bc\""));
System.out.println("\"a,\"\"bc" + "|||" + CSVUtil.addQuote("\"a,\"\"bc"));
System.out.println("a,\"\"bc\"" + "|||" + CSVUtil.addQuote("a,\"\"bc\""));
}
}
常见的实现:
2. Definition of the CSV Format
While there are various specifications and implementations for the
CSV format (for ex. [4], [5], [6] and [7]), there is no formal
specification in existence, which allows for a wide variety of
interpretations of CSV files. This section documents the format that
seems to be followed by most implementations:
1. Each record is located on a separate line, delimited by a line
break (CRLF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending line
break. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There maybe an optional header line appearing as the first line
of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file
and should contain the same number of fields as the records in
the rest of the file (the presence or absence of the header line
should be indicated via the optional "header" parameter of this
MIME type). For example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
4. Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:
aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
所以通常
""
代表空字符串,原始 CSV 中的
""""
代表单引号,
"
。
但是,您可能会遇到使用
\
等转义字符转义引号或其他字符(分隔符、换行符、转义字符本身)的数据。例如,在readr的
read_csv()
中,这是由
escape_double
和
escape_backslash
控制的。一些不寻常的数据使用像
#
这样的注释字符(R 中默认为
read.table
,但不是
read.csv
)。
我所做的只是
base64_encode
和
base64_decode
,即在写入CSV行之前将值编码为Base64,当我想读取它时,解码。对于您的示例,假设它是 PHP:
$csvLine = [base64_encode('Samsung U600 24"'),"10000003409","1","10000003427"];
当我想获取该值时,我会做相反的事情。
$value = base64_decode($csvLine[0])
我只是不喜欢经历痛苦。
创建一个静态类,如下所示:
/// <summary>
/// Wraps value in quotes if necessary and converts nulls to empty string
/// </summary>
/// <param name="value"></param>
/// <returns>String ready for use in CSV output</returns>
public static string Q(this string value)
{
if (value == null)
{
return string.Empty;
}
if (value.Contains(",") || (value.Contains("\"") || value.Contains("'") || value.Contains("\\"))
{
return "\"" + value + "\"";
}
return value;
}
然后对于写入 CSV 的每个字符串,而不是:
stringBuilder.Append( WhateverVariable );
你只需要做:
stringBuilder.Append( WhateverVariable.Q() );