无法在猪身上进行总和操作

问题描述 投票:0回答:2

我正在尝试对我的猪数据执行求和操作,但它不接受显式类型转换我已尝试用double替换(int)同时执行求和。

drivers = LOAD '/sachin/drivers.csv' USING PigStorage(',');
time = LOAD '/sachin/timesheet.csv' USING PigStorage(',');
drivdata = FILTER drivers BY $0>1;
timedata = filter time by $0>0;
drivgrp = group timedata by $0;
drivinfo = foreach drivgrp generate group as id , SUM(timedata.$2) as totalhr , SUM(timedata.$3) as totmillogged;
drivfinal = foreach drivdata generate $0 as id , $1 as name;
result = join drivfinal by id , drivinfo by id;
finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, $4 as mileslogged;
summile = foreach finalres generate (int)SUM(mileslogged);
DUMP summile;

错误信息

grunt> exec /home/sachin/sec.pig
2017-12-13 21:57:58,812 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 1 time(s).
2017-12-13 21:57:58,854 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:58,996 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,036 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,080 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,121 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,192 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,246 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: <line 10, column 41> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Details at logfile: /home/sachin/pig_1513175202309.log
grunt> 

我实际上是尝试对前5个列表中的每个驱动程序执行操作,并查找记录的里程数和驱动程序记录的总里程记录的百分比,并将结果存储在hdfs中。

数据集链接:https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/hdp/how-to-process-data-with-apache-pig/assets/driver_data.zip

任何人都可以帮我解决这个问题或帮助我理解这里出了什么问题?

hadoop apache-pig
2个回答
0
投票

你必须施放里程数,然后调用SUM函数

finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, (int)$4 as mileslogged; 
summile = foreach finalres generate SUM(mileslogged);

另外我注意到你没有在load语句中指定数据类型。默认数据类型是bytearray,如果你没有在后续步骤中显式地转换字段,我怀疑你会得到正确的结果。


0
投票

来自http://pig.apache.org/docs/r0.17.0/func.html#sum SUM定义为

计算单列包中数值的总和。 SUM要求全局和的前一个GROUP ALL语句和组和的GROUP BY语句。

您的代码传递了一个double,而SUM需要一个包含双精度的BAG。无需进行类型转换,但需要在调用SUM函数之前进行分组。

allres = group finalres ALL;
summile = foreach allres generate SUM(finalres.mileslogged);
DUMP summile;
© www.soinside.com 2019 - 2024. All rights reserved.