当我用如下所示的命令运行我的脚本时,用 police_force
参数设置为 "Surrey Police"
它给我一个错误
"ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. File not found: Police""
如果我把值传给 "Surrey_Police",它运行得很好,但没有返回任何东西。
-- knownvalues: dataset presented
-- date1: one date of comparison
-- date2: 2nd date of comparison
-- police_force: falls within
-- crime_type
-- Usage: exec -param knownvalues='/user/cw/input/all.txt' -param date1='2017-05' -param date2='2017-06' -param police_force="Surrey Police" /home/xiaorui/CW/compare_crime.pig
knownvalues = LOAD '$knownvalues' USING PigStorage(',') AS (crimeid:chararray,month:chararray,reportedby:chararray,fallswithin:chararray,longitude:float,latitude:float,location:chararray,lsoacode:chararray,lsoaname:chararray,crimetype:chararray,lastoutcome:chararray,context:chararray);
knownvalues = SAMPLE knownvalues 0.00001;
location = FILTER knownvalues BY (fallswithin MATCHES $police_force);
first_date = FILTER location BY (month MATCHES '$date1');
second_date = FILTER location BY (month MATCHES '$date2');
DUMP first_date;
如果我使用下面这一行,代码就能正常工作。
location = FILTER knownvalues BY (fallswithin MATCHES 'Surrey Police');
我通过以下步骤实现了。
a.)首先police_force from filter命令应该像下面这样用单引号括起来.location = FILTER knownvalues BY (fallswithin MATCHES) '$police_force');
b.) 其次我们需要 包括转义字符() 也 单引号或双引号 在执行命令中。
pig -x local -param knownvalues='/home/ec2-user/data' -param police_force="Surrey\ Police" /home/ec2-user/test.pig
or
pig -x local -param knownvalues='/home/ec2-user/data' -param police_force='Surrey\ Police' /home/ec2-user/test.pig
下面是我的测试代码和命令。
猪输入数据文件:猫数据
mary,19
john,18
joe,18
Surrey Police,20
猪样本代码:猫测试.猪
knownvalues = LOAD '$knownvalues' USING PigStorage(',') AS (name:chararray,age:int);
dump knownvalues;
describe knownvalues;
location = FILTER knownvalues BY (name MATCHES '$police_force');
dump location;
describe location;
输出:
负载后:
(mary,19)
(john,18)
(joe,18)
(Surrey Police,20)
knownvalues: {name: chararray,age: int}
过滤后:
(Surrey Police,20)
location: {name: chararray,age: int}