从具有子目录的HDFS目录中创建Impala表

问题描述 投票:0回答:1

我有一个目录,例如/user/name/folder

在此文件夹中,我还有更多名为dt=2020-06-01dt=2020-06-02dt=2020-06-03等的文件夹

这些文件夹包含实木复合地板文件。它们都具有相同的架构。

是否可以使用/user/name/folder创建黑斑羚表格?

每次,我得到一个包含0条记录的表。有没有办法告诉Impala从所有子目录中提取实木复合地板文件?

hdfs impala
1个回答
0
投票

一种方法是通过静态分区加载数据,在该分区中您可以手动定义不同的分区。对于静态分区,您可以使用ALTER TABLE…ADD PARTITION语句手动创建分区,然后将数据加载到分区中。

CREATE TABLE customers_by_date 
        (cust_id STRING, name STRING) 
PARTITIONED BY (dt STRING)
STORED AS PARQUET;

ALTER TABLE customers_by_country ADD PARTITION (dt='2020-06-01') SET LOCATION '/user/name/folder/dt=2020-06-01';
````

If the location is not specified then the location is created
````mysql
ALTER TABLE customers_by_date
ADD PARTITION (dt='2020-06-01');
````
and you could load data with HDFS commands too
````
$ hdfs dfs -cp /user/name/folder/dt=2020-06-01 /user/directory_impala/table/partition
````
You could follow these links to the Cloudera documentation for further details:
[Impala Create table statement][1]

[Impala Alter table statement][2]


  [1]: https://docs.cloudera.com/documentation/enterprise/5-9-x/topics/impala_create_table.html#create_table
  [2]: https://docs.cloudera.com/documentation/enterprise/5-9-x/topics/impala_alter_table.html#alter_table
© www.soinside.com 2019 - 2024. All rights reserved.