[在使用猪中的过滤器时出现错误,当我转储结果时会给出错误

问题描述 投票:0回答:1

用于Pig的代码是:

studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;

错误如下:

ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)
hadoop apache-pig
1个回答
0
投票

尝试转储并描述您的每个结果集,以查看所使用的每个别名的输出。

参考:scalar-has-more-than-one-row-in-the-output

studentsR = LOAD '/home/user/students' using PigStorage(' ') as (name:chararray,rollno:int);
dump studentsR;
resultR = LOAD '/home/user/results' using PigStorage(' ') as (rollno:int,result:chararray);
dump resultR;
joniR = JOIN studentsR BY rollno,resultR BY rollno;
dump joniR;
filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
dump filterR;
filterRPass = FILTER filterR BY resultR::result == 'pass';
dump filterRPass;

修改:

我将输入文件中的空间用作分隔符,因此使用了PigStorage('')

在filterR中,我删除了学生R :: name,studentsR :: rollno,resultR :: result周围的左括号和右括号,因为dump的输出中还有其他的右括号。

grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe  filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe  filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}

使用fifilterRPass中的resultR :: result而不是resultR.result

我使用了一组本地文件,并以本地模式执行pig进行测试。

cat students
a 1
b 2
c 3

cat results
3 pass
2 fail
5 pass

转储结果:

dump studentsR
(a,1)
(b,2)
(c,3)

dump resultR
(3,pass)
(2,fail)
(5,pass)

dump joniR
(b,2,2,fail)
(c,3,3,pass)

dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))

dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)

dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass';  --or-- filterRPass = FILTER filterR BY $2 == 'pass';
(c,3,pass)
© www.soinside.com 2019 - 2024. All rights reserved.