我在 S3 中有一个 CSV 文件,如下所示:
id,name,secondary_id,created_at,last_modified_at,tags,report
2a-4c-4d-b0,foo1,103776194,2021-10-23 13:28:02.837511,2021-10-23 13:34:55.781556,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",health
2a-4c-4d-b0,bar1,103776194,2021-10-23 13:28:02.837511,2021-10-23 13:34:55.781556,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",risk
fc-ab-4a-8b,foo2,103101839,2021-10-23 12:54:25.662775,2021-10-23 12:56:54.53149,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",health
a9-2e-4e-b3,bar2,103776194,2021-10-23 13:23:35.286249,2021-10-23 13:35:22.340411,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",risk
我尝试使用查询:
CREATE EXTERNAL TABLE IF NOT EXISTS `test_table`
(
id STRING,
name STRING,
secondary_id STRING,
created_at TIMESTAMP,
last_modified_at TIMESTAMP,
tags STRING,
report STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://location/'
TBLPROPERTIES (
'skip.header.line.count' = '1'
);
但由于标签中有逗号(,),因此无法正确填充表格并将其视为不同的列。
有人知道如何解决吗?谢谢你。
我能够弄清楚。所以,我使用了图书馆
OpenCSVSerde
并使用了WITH SERDEPROPERTIES ('separatorChar' = ',', 'quoteChar' = '"')
。因此 separatorChar
分隔列,而 quoteChar
定义列的开头和结尾。
因此,即使我的所有其他专栏都不是以
"
开头,它仍然可以正确解释它。希望这有帮助。