在HDFS上从csv创建外部表,所有值都带引号。

问题描述 投票:1回答:1

我在HDFS上有一个csv文件,我正试图创建一个impala表,情况是它创建的表和值与所有的" "。

CREATE external TABLE abc.def
(

name STRING,
title STRING,
last  STRING, 
pno STRING
)
row format delimited fields terminated by ','

location 'hdfs:pathlocation'
tblproperties ("skip.header.line.count"="1") ;

输出是名称瓷砖最后的pno "abc""mr""xyz""1234""rew""ms""pre""654"

我只是想从csv文件中创建一个没有引号的表。请指导我哪里错了。Regards, R

sql impala
1个回答
1
投票

一种方法是创建一个阶段表,加载文件与引号,然后用CTAS(创建表作为选择)创建正确的表,清理字段与替换功能。

CREATE TABLE quote_stage(
 id STRING,
 name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
+-----+----------+
| id  | name     |
+-----+----------+
| "1" | "pepe"   |
| "2" | "ana"    |
| "3" | "maria"  |
| "4" | "ramon"  |
| "5" | "lucia"  |
| "6" | "carmen" |
| "7" | "alicia" |
| "8" | "pedro"  |
+-----+----------+
CREATE TABLE t_quote 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS SELECT replace(id,'"','') AS id, replace(name,'"','') AS name FROM quote_stage;
+----+--------+
| id | name   |
+----+--------+
| 1  | pepe   |
| 2  | ana    |
| 3  | maria  |
| 4  | ramon  |
| 5  | lucia  |
| 6  | carmen |
| 7  | alicia |
| 8  | pedro  |
+----+--------+

希望对你有所帮助。

© www.soinside.com 2019 - 2024. All rights reserved.