使用正则表达式分隔符将数据加载到 Hive 表时获取额外的空值

问题描述 投票:0回答:1

我在hdfs上的一个文件中有以下5行数据。我想将其加载到表中。我有正则表达式可以做到这一点,但它为每行数据加载一行额外的空值。有谁知道为什么会这样?

19/Mar/2018 3:00:06 INFO activity Submitted to Splunk
19/Mar/2018 3:00:20 INFO activity response received statuscode=200 bytesreceived=11548264
19/Mar/2018 3:00:21 INFO activity done writing K:\Data\031818\activity_031818.csv lineswritten=296110
19/Mar/2018 3:00:21 INFO hardware Submitted to Splunk 

我用它来创建表格

create table Splunk_BCO_MSR 
(
ts string, 
status string, 
area string, 
text string
) 
partitioned by (partition_dt date)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex" = "([^ ]+[ ][^ ]*) ([^ ]*) ([^ ]*) (.*)?");

这几乎可以工作,但是当我从表中运行 select * 时,我得到 8 行而不是 4 行。看起来好像添加了额外的 NULL 行。

| 19/Mar/2018 3:00:06  | INFO              | activity  | Submitted to Splunk                                                                                                                                        | 2018-03-18                   |
| NULL                                     | NULL                   | NULL                       | NULL                                                                                                                                                                           | 2018-03-18                   |
| 19/Mar/2018 3:00:20  | INFO              | activity  | response received statuscode=200 bytesreceived=11548264                                                                | 2018-03-18                   |
| NULL                                     | NULL                   | NULL                       | NULL                                                                                                                                                                           | 2018-03-18                   |
| 19/Mar/2018 3:00:21  | INFO              | activity  | done writing K:\Data\031818\activity_031818.csv lineswritten=296110  | 2018-03-18                   |
| NULL                                     | NULL                   | NULL                       | NULL                                                                                                                                                                           | 2018-03-18                   |
| 19/Mar/2018 3:00:21  | INFO              | hardware  | Submitted to Splunk                                                                                                                                        | 2018-03-18                   |
| NULL                                     | NULL                   | NULL                       | NULL                                                                                                                                                                           | 2018-03-18  
hive null hdfs hive-serde
1个回答
0
投票

这个问题解决了吗?我也有类似问题。

© www.soinside.com 2019 - 2024. All rights reserved.