我正在尝试使用OpenCSV解析CSV文件。其中一列以YAML序列化格式存储数据,并被引用,因为其中可以包含逗号。它的内部也带有引号,因此可以通过添加两个引号来对其进行转义。我可以在Ruby中轻松解析此文件,但是使用OpenCSV不能完全解析它。这是UTF-8编码的文件。
这是我的Java代码段,正在尝试读取文件
CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');
这里是此文件的2行。由于我猜测转义的双引号,第一行未正确解析,并且在""[Fair Trade Certified]""
处被拆分。
1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
:brand_id: 49120
:size: each
:alcoholic: false
:details: ""[Fair Trade Certified]""
:gluten_free: false
:kosher: false
:low_fat: false
:organic: false
:sugar_free: false
:fat_free: false
:vegan: false
:vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,
解决方案是使用Paul所建议的与RFC4180兼容的CSV解析器。我曾经使用过OpenCSV的CSVReader,但是它无法正常工作,或者可能无法正常工作。
我使用了FastCSV,这是RFC4180 CSV解析器,它可以无缝运行。
File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
System.out.println(row.getFieldCount());
}
首先,我很高兴FastCSV为您工作,但我运行了可疑的子字符串,并通过3.9 openCSV运行了它,它与CsvParser和RFC4180Parser都可以使用。您能否提供一些有关它没有解析的细节,和/或使用3.9 openCSV进行尝试,看看是否遇到相同的问题,然后尝试以下配置。
这是我使用的测试:
CSVParser:
@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944";
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" ;
StringReader stringReader = new StringReader(bigline);
CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();
String item[] = csvReader.readNext();
assertEquals(5, item.length);
assertEquals("28613099", item[0]);
assertEquals("Product::Source", item[1]);
assertEquals(suspectString, item[2]);
}
RFC4180Parser
def 'parse big line from stackoverflow with complex string'() {
given:
RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
RFC4180Parser parser = builder.build()
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944"
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n"
when:
String[] values = parser.parseLine(bigline)
then:
values.length == 5
values[0] == "28613099"
values[1] == "Product::Source"
values[2] == suspectString
}