我写了以下查询:
CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
`price` double,
`genre` string,
`all_genres` string,
`languages` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/user/abc/data'
TBLPROPERTIES (
'transient_lastDdlTime'='1588006839');
最后两列是具有以下模式的数组列表:['val1','val2','val3']启动后没有错误。
运行这段代码后,一切正常,但是当我添加'SELECT * FROM temp_data'
我收到一个错误:Failed to fetch next batch for the Resultset
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file /user/abc/data/data.csv. Invalid postscript.
有谁知道如何修理它?
AFAIK,因为数组列不同,您需要创建数组数据类型
arrays: ARRAY<data_type>
像这样一个>]
CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`( `price` double, `genre` string, `all_genres` array<string>, `languages` array<string>) ... remaining as it is.
否则,我认为它无法确定thease数组列。