我在每个论坛上用谷歌搜索了这个错误,但没有成功。我收到以下错误:
18/08/29 00:24:53 INFO mapreduce.Job: map 0% reduce 0%
18/08/29 00:24:59 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
18/08/29 00:25:45 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
18/08/29 00:25:52 INFO mapreduce.Job: map 100% reduce 100%
18/08/29 00:25:53 INFO mapreduce.Job: Job job_1535105716146_0226 failed with state FAILED due to: Task failed task_1535105716146_0226_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0
18/08/29 00:25:53 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
我还在 python 独立命令的帮助下尝试了我的 map-reduce 代码
cat student1.txt | python mapper.py | python reducer.py
代码工作得很好。但是当我通过 Hadoop Streaming 尝试它时,它会反复抛出上述错误。我的输入文件大小是 3KB。在更改 python 版本之后,我也尝试了运行 Hadoop-streaming 命令,但没有运气!我还在脚本顶部添加了
#!/usr/bin/python
命令。该目录里面什么都没有。我也尝试了不同版本的命令:
版本 1:
hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper mapper.py -file /home/reducer.py -reducer reducer.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt
版本 2:带有单引号和双引号的 python 命令
hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper "python mapper.py" -file /home/reducer.py -reducer "python reducer.py" -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt
简单的字数统计程序在环境中成功运行,也生成正确的输出,但是当我在 python 脚本中添加 mysql.connector 服务时,Hadoop-streaming 报告此错误。我也研究了工作日志,但没有找到这样的信息。
如果您的问题不是关于 python 库或代码问题,它可能是关于 python 文件注释(第一行)和您的操作系统。
对我来说,在 MAC OS 上,在使用本教程本地安装 HADOOP 之后:tuto Python 映射器/缩减器执行不佳。 错误:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
或者
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
我的配置:
要使用 python 启动您的工作,我使用新命令:
mapred streaming
而不是 hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar
形式的 Hadoop documentation
(小心,我认为这个文档对带有通用选项的例子不好(不推荐使用:-file,新的:-files)
我发现了两种可能性:
# -*-coding:utf-8 -*
只有这个命令对我有用:
mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"
假设我想用本地 Python 文件
/data/input/README.TXT
和 hadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input
计算已经复制到我的 HDFS 卷 (
WordCountMapper.py
) 中的
WordCountReducer.py
的字数
WordCountMapper.py 代码:
#!/usr/bin/python
# -*-coding:utf-8 -*
import sys
for line in sys.stdin:
# Supprimer les espaces
line = line.strip()
# recupérer les mots
words = line.split()
# operation map, pour chaque mot, generer la paire (mot, 1)
for word in words:
print("%s\t%d" % (word, 1))
WordCountReducer.py 的代码:
#!/usr/bin/python
# -*-coding:utf-8 -*
import sys
total = 0
lastword = None
for line in sys.stdin:
line = line.strip()
# recuperer la cle et la valeur et conversion de la valeur en int
word, count = line.split()
count = int(count)
# passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)
if lastword is None:
lastword = word
if word == lastword:
total += count
else:
print("%s\t%d occurences" % (lastword, total))
total = count
lastword = word
if lastword is not None:
print("%s\t%d occurences" % (lastword, total))
2.1。为 python 文件添加执行模式:
chmod +x WordCountMapper.py
chmod +x WordCountReducer.py
2.2。首先添加 2 行 :
first line : `#!/usr/bin/python`
second line : `# -*-coding:utf-8 -*`
使用这个命令:
mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper ./WordCountMapper.py \
-reducer ./WordCountReducer.py
我检查了作业错误日志并将不是预定义库的所需 python 文件放入 python 目录中。然后,使用那些 python 文件输入 Hadoop 流式命令:
hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt
对我来说,它正在改变
#!/usr/bin/env python
到#!/usr/bin/env python3