Apache Pig Query - 数据集加入ERROR 1031

问题描述 投票:-1回答:1

我有以下四个任务要处理,但我对如何加入两个数据集以使任何任务正常工作感到困惑...

A)查询具有最少数量的事务和输出客户名称以及事务数量的客户名称。

B)使用广播(复制)加入加入客户和交易。报告:CustomerID,Name,Salary,NumOf Transactions,TotalSum,MinItems(其中NumOfTransactions是客户完成的交易总数,TotalSum是该客户的字段“TransTotal”的总和,MinItems是最小的项目数客户完成的交易。)

C)报告客户数量大于5,000或小于2,000的国家代码。

D)假设我们想要对数据设计分析任务如下:Age属性分为六组,分别是[10,20],[20,30],[30,40],[40,50] ,[50,60]和[60,70]。在上述每个年龄范围内,基于“性别”进行进一步划分,即,将6个年龄组中的每一个进一步分成两组。每组报告:年龄范围,性别,MinTransTotal,MaxTransTotal,AvgTransTotal。注意:括号“[”表示包含范围的下限,其中“)”表示排除范围的上限。

这是我开始的:

hadoop fs -mkdir /piginput
sudo hadoop fs -put customer.txt /piginput
sudo hadoop fs -put transaction.txt /piginput
sudo hadoop fs -put transaction_small.txt /piginput

pig 

customers = LOAD '/piginput/customers.txt' USING PigStorage(',') AS (id:int,name:chararray,age:int,gender:chararray,CountryCode:int,salary:float);

transactions = LOAD '/piginput/transaction.txt' USING PigStorage(',') as (trans_id:int, id:int, age:int, total:float, num_items:int, description:chararray);

alldata = JOIN customers BY id, transactions BY id;

by_clusters_terms_count = FOREACH alldata COUNT(id);

产生错误:

猪堆跟踪

ERROR 1031: Incompatable schema: left is          "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"

Failed to parse: Pig script failed to parse: 
<line 4, column 26> pig script failed to validate:     org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031:     Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is     "customers::id:int"
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1684)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1657)
at org.apache.pig.PigServer.registerQuery(PigServer.java:600)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1069)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: 
<line 4, column 26> pig script failed to validate:     org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"
at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1041)
at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15870)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 15 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"
at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760)
at org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:158)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:123)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:245)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
at     org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1039)
... 21 more

有任何想法吗?我是否错误地加入数据集导致问题?

apache hadoop apache-pig
1个回答
0
投票
customers = LOAD 'hdfs://hadoop-VirtualBox:8020/piginput/customer.txt' USING  PigStorage(',') AS  (id:int,name:chararray,age:int,gender:chararray,CountryCode:int,salary:float);
 A = foreach customers generate id, name;
 transactions = LOAD 'hdfs://hadoop-VirtualBox:8020/piginput/transaction_small.txt' USING PigStorage(',') as (trans_id:int, cust_id:int, total:float, num_items:int,  description:chararray);
 B = foreach transactions generate cust_id,num_items; 
 alldata = JOIN A BY id, B BY cust_id;
 C = GROUP alldata by $0;

这最终解决了问题

© www.soinside.com 2019 - 2024. All rights reserved.