如何从包含 JSON 列的 CSV 文件在 Athena 中创建外部表

问题描述 投票:0回答:1

我在 S3 中有一个 CSV 文件,如下所示:

id,name,secondary_id,created_at,last_modified_at,tags,report
2a-4c-4d-b0,foo1,103776194,2021-10-23 13:28:02.837511,2021-10-23 13:34:55.781556,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",health
2a-4c-4d-b0,bar1,103776194,2021-10-23 13:28:02.837511,2021-10-23 13:34:55.781556,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",risk
fc-ab-4a-8b,foo2,103101839,2021-10-23 12:54:25.662775,2021-10-23 12:56:54.53149,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",health
a9-2e-4e-b3,bar2,103776194,2021-10-23 13:23:35.286249,2021-10-23 13:35:22.340411,"{""reports"": {""risk"": {""status"": ""ACTIVE""}, ""analysis"": {""status"": ""ACTIVE""}}}",risk

我尝试使用查询:

CREATE EXTERNAL TABLE IF NOT EXISTS `test_table`
(
  id STRING,
  name STRING,
  secondary_id STRING,
  created_at TIMESTAMP,
  last_modified_at TIMESTAMP,
  tags STRING,
  report STRING
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
LOCATION 's3://location/'
TBLPROPERTIES (
  'skip.header.line.count' = '1'
);

但由于标签中有逗号(,),因此无法正确填充表格并将其视为不同的列。

有人知道如何解决吗?谢谢你。

amazon-web-services amazon-athena
1个回答
0
投票

我能够弄清楚。所以,我使用了图书馆

OpenCSVSerde
并使用了
WITH SERDEPROPERTIES ('separatorChar' = ',', 'quoteChar' = '"')
。因此
separatorChar
分隔列,而
quoteChar
定义列的开头和结尾。

因此,即使我的所有其他专栏都不是以

"
开头,它仍然可以正确解释它。希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.