Hadoop Pig Latin始终无法加载数据

问题描述 投票:1回答:1

[我正在Hadoop的Pig Latin中迈出第一步,但由于无法加载任何输入数据(即使存在),我也确实受到了限制

R = LOAD '/home/cloudera/Desktop/vol.csv' USING PigStorage(';')  AS (AnneeVol:int,MoisVol:int,JourVol:int,NumVol:int,AeroDep:chararray,AeroArriv:chararray,DistVol:int);

File Location

运行时:

DUMP R;

我收到此错误

> Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1590934825774_0010  R   MAP_ONLY    Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/pig_lab/input/vol.csv
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/pig_lab/input/vol.csv
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
    ... 18 more
    hdfs://quickstart.cloudera:8020/tmp/temp14790577/tmp132115611,
> 
> Input(s): Failed to read data from
> "/home/cloudera/pig_lab/input/vol.csv"
> 
> Output(s): Failed to produce result in
> "hdfs://quickstart.cloudera:8020/tmp/temp14790577/tmp132115611"

如何解决此问题?谢谢!

hadoop apache-pig
1个回答
0
投票

程序尝试在本地文件系统上不在HDFS中找到vol.csv

Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/pig_lab/input/vol.csv

请检查core-site.xml以获取默认文件系统。当前它将具有值hdfs://quickstart.cloudera:8020。这就是为什么它搜索文件HDFS。您不必在那里进行任何更改。

仅在告诉程序从本地文件系统中找到file://的路径之前添加vol.csv标记。

R = LOAD 'file:///home/cloudera/Desktop/vol.csv' USING PigStorage(';')  AS (AnneeVol:int,MoisVol:int,JourVol:int,NumVol:int,AeroDep:chararray,AeroArriv:chararray,DistVol:int);

参考:Cloudera blog


如果不能将文件放入HDFS,然后在代码中引用该位置。

hdfs dfs -put /home/cloudera/Desktop/vol.csv hdfs://quickstart.cloudera:8020/user/<hdfs-user>/

然后输入您的代码

R = LOAD '/user/<hdfs-user>/vol.csv' USING PigStorage(';')  AS (AnneeVol:int,MoisVol:int,JourVol:int,NumVol:int,AeroDep:chararray,AeroArriv:chararray,DistVol:int);
© www.soinside.com 2019 - 2024. All rights reserved.