尝试并行运行 DF 列表(在本地 mac 上的 pyspark 中)并总是以以下异常结束
>>> df1=spark.range(10)
>>> df2=spark.range(10)
>>> df=[df1,df2]
>>> p=spark.sparkContext.parallelize(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/context.py", line 574, in parallelize
jrdd = self._serialize_to_jvm(c, serializer, reader_func, createRDDServer)
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/context.py", line 611, in _serialize_to_jvm
serializer.dump_stream(data, tempFile)
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 211, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 133, in dump_stream
self._write_with_length(obj, stream)
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 143, in _write_with_length
serialized = self.dumps(obj)
File "/spark-3.2.2-bin-hadoop3.2-scala2.13/python/pyspark/serializers.py", line 427, in dumps
return pickle.dumps(obj, pickle_protocol)
TypeError: can't pickle _thread.RLock objects