pyspark parallalize(df) 抛出 TypeError: can't pickle _thread.RLock objects

问题描述 投票:0回答:0

尝试并行运行 DF 列表(在本地 mac 上的 pyspark 中)并总是以以下异常结束

>>> df1=spark.range(10)
>>> df2=spark.range(10)
>>> df=[df1,df2]
>>> p=spark.sparkContext.parallelize(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/context.py", line 574, in parallelize
    jrdd = self._serialize_to_jvm(c, serializer, reader_func, createRDDServer)
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/context.py", line 611, in _serialize_to_jvm
    serializer.dump_stream(data, tempFile)
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 211, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 133, in dump_stream
    self._write_with_length(obj, stream)
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 143, in _write_with_length
    serialized = self.dumps(obj)
  File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 427, in dumps
    return pickle.dumps(obj, pickle_protocol)
TypeError: can't pickle _thread.RLock objects
python python-3.x apache-spark pyspark pickle
© www.soinside.com 2019 - 2024. All rights reserved.