假设我有一个RDD,我需要映射一个可能失败的任务
rdd = sc.parallelize([1,2,3])
rdd.map(a_task_that_can_fail)
有没有办法设置火花来尽最大努力运行任务。我希望的行为是:
“set spark”到底是什么意思? python代码怎么样?或者你的意思是失败,如内存不足?
rdd = sc.parallelize([1,2,3, 0])
def try_and_return(x, number_of_trys = 5):
try_number = 0;
return_value = None;
while (try_number < number_of_trys):
try:
return_value = 1.0/x
break
except ZeroDivisionError:
try_number += 1
return return_value
print rdd.map(lambda x: try_and_return(x)).collect()
[1.0,0.5,0.3333333333333333,无]