HEADER_A|HEADER_B|HEADER_C
A|B|C
A D|B| => records without comma generates output like "A D|B|"
A,D|B| => records with comma generates output like " A,D|B| "
Spark配置为:
sparkSession.read()
.option("header","true")
.option("delimiter","|")
.schema(schema) * assume this is valid and represents the correct schema
.csv(fileName)
.cache();
我尝试使用“ sep”选项,但效果不佳。如果我的定界符是“ |”,为什么Spark对带有逗号的记录会有不同的影响?
我的输入是“ |” (管道)分隔符文件。我无法更改输入文件。格式为HEADER_A | HEADER_B | HEADER_CA A | B | CA D | B | =>没有逗号的记录会生成类似“ A D | B |”的输出A,D | B | => ...