从ORC文件创建外部配置单元表的方式

Question

我正在尝试在ORC文件上创建外部配置单元表。

用于创建表的查询：

create external table fact_scanv_dly_stg (
store_nbr int,
geo_region_cd char(2),
scan_id int,
scan_type char(2),
debt_nbr string,
mds_fam_id string,
upc_nbr string,
sales_unit_qty string,
sales_amt string,
cost_amt string,
visit_dt string,
rtn_amt string,
crncy_cd string,
op_cmpny_cd string)
STORED AS ORC
location 'hdfs:///my/location/scanv_data/';

ORC文件的架构详细信息（从DataFrame Spark-SQL那里获得：

 |-- _col0: integer (nullable = true)
 |-- _col1: string (nullable = true)
 |-- _col2: integer (nullable = true)
 |-- _col3: byte (nullable = true)
 |-- _col4: short (nullable = true)
 |-- _col5: integer (nullable = true)
 |-- _col6: decimal(18,0) (nullable = true)
 |-- _col7: decimal(9,2) (nullable = true)
 |-- _col8: decimal(9,2) (nullable = true)
 |-- _col9: decimal(9,2) (nullable = true)
 |-- _col10: date (nullable = true)
 |-- _col11: decimal(9,2) (nullable = true)
 |-- _col12: string (nullable = true)
 |-- _col13: string (nullable = true)

但是当我尝试在创建的表上选择以下错误时：

select * from fact_scanv_dly_stg limit 5;


OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot 
be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable

有任何建议吗？

Answer 1

此处，由于您在spark模式中的列_col3为字节，而您在蜂巢中将其指定为字符串，因此发生了此错误。解决方案是将所有列转换为spark中的所需数据类型，然后编写ORC文件，或将蜂巢模式中的确切数据类型映射为spark。

从ORC文件创建外部配置单元表的方式

问题描述投票：0回答：1

1个回答

最新问题

从ORC文件创建外部配置单元表的方式

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1