我从NOAA下载了天气.txt文件,如下所示:
WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag
00102,20150101,0001,0,OVC043, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 0, ,000, , , ,30.05, , , , , ,30.36, ,AA, , , ,30.23,
00102,20150101,0101,0,OVC045, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 6, ,080, , , ,30.07, , , , , ,30.37, ,AA, , , ,30.25,
00102,20150101,0201,0,OVC047, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 6, ,090, , , ,30.08, , , , , ,30.39, ,AA, , , ,30.26,
00102,20150101,0301,0,OVC049, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 7, ,100, , , ,30.09, , , , , ,30.40, ,AA, , , ,30.27,
然后我创建了下表:
CREATE EXTERNAL TABLE weather(WBAN STRING, `Date` STRING, Time STRING, StationType INT, SkyCondition STRING, SkyConditionFlag STRING, Visibility INT, VisibilityFlag STRING, WeatherType STRING, WeatherTypeFlag STRING, DryBulbFarenheit INT, DryBulbFarenheitFlag STRING, DryBulbCelsius DECIMAL, DryBulbCelsiusFlag INT, WetBulbFarenheit INT, WetBulbFarenheitFlag INT, WetBulbCelsius DECIMAL, WetBulbCelsiusFlag INT, DewPointFarenheit INT, DewPointFarenheitFlag INT, DewPointCelsius DECIMAL, DewPointCelsiusFlag INT, RelativeHumidity INT, RelativeHumidityFlag INT, WindSpeed INT, WindSpeedFlag INT, WindDirection INT, WindDirectionFlag INT, ValueForWindCharacter INT, ValueForWindCharacterFlag INT, StationPressure DECIMAL, StationPressureFlag INT, PressureTendency INT, PressureTendencyFlag INT, PressureChange INT, PressureChangeFlag INT, SeaLevelPressure DECIMAL, SeaLevelPressureFlag INT, RecordType STRING, RecordTypeFlag STRING, HourlyPrecip DECIMAL, HourlyPrecipFlag INT, Altimeter DECIMAL, AltimeterFlag INT)
COMMENT 'Our weather table in HIVE!'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/data/Weather';
现在,如果我尝试一个简单的查询,如:
hive> select * from weather limit 10;
我得到了如下结果,并用Null替换了一些列的名字!
WBAN Date Time NULL SkyCondition SkyConditionFlag NULL VisibilityFlag WeatherType WeatherTypeFlag NULL DryBulbFarenheitFlag NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL RecordType RecordTypeFlag NULL NULL NULL NULL
00102 20150101 0001 0 OVC043 10 27 -3 NULL 26 NULL -3 NULL25 NULL -4 NULL NULL NULL NULL NULL 0 NULL NULL NULL 30 NULL NULL NULL NULL NULL 30 NULL AA NULL NULL 30 NULL
00102 20150101 0101 0 OVC045 10 27 -3 NULL 26 NULL -3 NULL25 NULL -4 NULL NULL NULL NULL NULL 80 NULL NULL NULL 30 NULL NULL NULL NULL NULL 30 NULL AA NULL NULL 30 NULL
00102 20150101 0201 0 OVC047 10
正如您可能注意到的那样,第四列和第七列(以及之后的许多列)在它们应分别为StationType和Visibility等时倾斜为NULL!
即使我尝试过:
hive> select Visibility from weather limit 10;
我会得到正确的结果,但有NULL列标题/名称!
为什么NULL列名称/标题?!
有趣的问题,我花了一分钟才意识到正在发生的事情但是对蜂巢的正确认识实际上是显而易见的!
所以,将1和2放在一起:
Describe Weather
这样的查询中看到的。建议:
尝试摆脱第一行,最好在创建外部表之前。
要添加丹尼斯上面的评论,如果您正在使用CSV SerDe,则可以跳过插入表格的第一行:
CREATE EXTERNAL TABLE cases (
id INT,
case_number STRING,
name STRING,
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION '/hdfs/path'
tblproperties("skip.header.line.count"="1");
操作线是:
tblproperties("skip.header.line.count"="1")