pyspark UDF 引发“无名为模块”错误

问题描述 投票:0回答:1

我有一个带有英文国家描述符

ds_pais
的数据框。我想使用
GoogleTranslator
通过
.withColumn
添加一列,将该国家/地区描述符从英语翻译为西班牙语。

from deep_translator import GoogleTranslator
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

def translate(to_translate):
    return GoogleTranslator(source='en', target='es').translate(to_translate)

translate_udf = udf(lambda x: translate(x), StringType())

df_pais_traducido = df_campo_calculado.withColumn('ds_pais_es', translate_udf(df_campo_calculado.ds_pais))

display(df_pais_traducido.select('ds_pais', 'ds_pais_es'))

但是当我运行它时,我收到此错误

PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 815, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 651, in read_udfs
    udfs.append(read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=i))
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 376, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 88, in read_command
    command = serializer._read_with_length(file)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 174, in _read_with_length
    return self.loads(obj)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 472, in loads
    return cloudpickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'deep_translator'

为什么udf找不到

GoogleTranslator

编辑:我正在使用 PySpark (Python) 在 Microsoft Fabric Notebook 上运行此程序。

apache-spark pyspark microsoft-fabric
1个回答
0
投票

重置 PySpark 会话后,它开始工作。如果再次失败,我将尝试使用工作空间环境并在那里安装软件包。

© www.soinside.com 2019 - 2024. All rights reserved.