我尝试使用以下代码从镶木地板文件插入数据:
INSERT INTO general_marts.gm_order
SELECT *
FROM s3(
'http://host:9000/general-marts/general_mart_order/year=2024/month=02/day=12/*.snappy.parquet',
'aws_access_key_id',
'aws_secret_access_key','Parquet');
通用超市订单架构:
CREATE TABLE general_marts.gm_order
(
order_id Int64,
user_id Int64,
vendor_id Int64,
total_price Int64,
total_price_after_discount Int64,
payable_price Int64,
customer_name String,
vendor_name String,
preparation_time Int32,
delivery_type Nullable(String),
accepted_at Nullable(DateTime64(3)),
nfc_reason Nullable(String),
rejected_at Nullable(DateTime64(3)),
paid_at Nullable(DateTime64(3)),
created_at DateTime64(3),
list_order_status Array(String),
list_order_status_datetime Array(DateTime64(3)),
list_payment_status Array(String),
list_payment_status_datetime Array(DateTime64(3)),
list_delivery_status Array(String),
list_delivery_status_datetime Array(DateTime64(3)),
voucher_id Nullable(String),
voucher_name Nullable(String),
voucher_code Nullable(String),
voucher_value Nullable(Int64),
ofood_share_delivery_price Int64,
customer_share_delivery_price Int64,
vendor_share_delivery_price Int64,
delivery_order_id Nullable(Int64),
packing_price Int64,
vendor_tax Int64,
refund_at Nullable(DateTime64(3)),
refund_amount Nullable(Int64),
city_name Nullable(String),
list_product_id Array(String),
list_product_name Array(String),
list_product_variation_id Array(String),
list_product_variation_name Array(String),
list_option_ids Array(Array(String)),
list_options Array(Array(String)),
list_option_label Array(Nullable(String)),
list_has_option Array(String),
list_price Array(String),
list_price_after_discount Array(String),
list_quantity Array(String),
list_stock Array(String),
list_capacity Array(String))
ENGINE = MergeTree
PRIMARY KEY order_id;
但是我收到了这个错误:
Code: 349. DB::Exception: Cannot convert NULL value to non-Nullable type: while converting column accepted_at from type Nullable(UInt32) to type UInt32: While executing ParquetBlockInputFormat: While executing S3. (CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN) (version 22.2.2.1)
我使用 Spark 将 Parquet 文件写入 Minio,然后尝试使用 ClickHouse 中的 S3 引擎读取它,但遇到错误。
问题似乎是由 Parquet 文件中包含“None”值的列引起的。但是,值得注意的是,该文件中的其他列具有默认值,例如空字符串 ('')、0 或特定日期 ('2022-01-01'),处理时没有任何问题。
我不想替换列中缺失的值(“填充 na”)。相反,我希望 ClickHouse 显示这些列,并且它们的空值完好无损。
这个问题只是针对clickhouse的版本,我将clickhouse更新到版本22.6.1.1985并且它可以工作。