我无法将csv加载到临时表HIVE

问题描述 投票:0回答:1

我写了以下查询:

CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
  `price` double, 
  `genre` string, 
  `all_genres` string, 
  `languages` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  '/user/abc/data'
TBLPROPERTIES (
  'transient_lastDdlTime'='1588006839');

最后两列是具有以下模式的数组列表:['val1','val2','val3']启动后没有错误。

运行这段代码后,一切正常,但是当我添加'SELECT * FROM temp_data'我收到一个错误:Failed to fetch next batch for the Resultsetorg.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file /user/abc/data/data.csv. Invalid postscript.有谁知道如何修理它?

database hadoop hive hiveql
1个回答
0
投票

AFAIK,因为数组列不同,您需要创建数组数据类型

arrays: ARRAY<data_type>

像这样一个>]

CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
  `price` double, 
  `genre` string, 
  `all_genres` array<string>, 
  `languages` array<string>) ... remaining as it is.

否则,我认为它无法确定thease数组列。

Working With Hive Complex Data Types

© www.soinside.com 2019 - 2024. All rights reserved.