从avro架构创建hive表时出错

问题描述 投票:0回答:1

我试图通过从存储在s3中的Avro数据中提取模式来创建一个配置单元表。使用s3 Kafka连接器将数据存储在s3中。我正在向制作人发布一个简单的POJO。

从Avro数据中提取模式的代码: -

for filename in os.listdir(temp_folder_path):
    filename = temp_folder_path + filename
        if filename.endswith('avro'):
            os.system(
                'java -jar /path/to/avro-jar/avro-tools-1.8.2.jar getschema {0} > {1}'.format(
                filename, filename.replace('avro', 'avsc')))

然后将提取的模式保存在s3存储桶中。

创建表查询: -

CREATE EXTERNAL TABLE IF NOT EXISTS `db_name_service.table_name_change_log` PARTITIONED BY (`year` bigint,
 `month` bigint, `day` bigint, `hour` bigint) ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://bucket/topics/topic_name' 
 TBLPROPERTIES ( 'avro.schema.url'='s3://bucket/schemas/topic_name.avsc')

错误:-

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: BYTES)

架构: -

{ "type": "record", "name": "Employee", "doc" : "Represents an Employee at a company", "fields": [ {"name": 
"firstName", "type": "string", "doc": "The persons given name"}, {"name": "nickName", "type": ["null",
 "string"], "default" : null}, {"name": "lastName", "type": "string"}, {"name": "age", "type": "int",
  "default": -1}, {"name": "phoneNumber", "type": "string"} ] }

我可以使用此命令./confluent-4.1.1/bin/kafka-avro-console-consumer --topic test2_singular --bootstrap-server localhost:9092 --from-beginning查看主题中的数据

{"firstName":"A:0","nickName":{"string":"C"},"lastName":"C","age":0,"phoneNumber":"123"}

{"firstName":"A:1","nickName":{"string":"C"},"lastName":"C","age":1,"phoneNumber":"123"}
amazon-s3 hive avro apache-kafka-connect confluent
1个回答
1
投票

表的模式必须是RECORD类型。收到类型:BYTES

如果您没有将AvroConverter用于Connect接收器配置,则可能发生这种情况的唯一方法。

您还需要从S3文件中提取架构。

提示:使用Lambda函数监视存储桶中的avro文件创建有助于在不扫描整个存储桶或随机文件的情况下获取模式,并用于通知Hive / AWS Glue表模式更新

© www.soinside.com 2019 - 2024. All rights reserved.