无法下载spark-nlp库提供的管道

问题描述 投票:0回答:1

我无法使用spark-nlp库提供的预定义管道“ recognize_entities_dl”

我尝试安装不同版本的pyspark和spark-nlp库

import sparknlp
from sparknlp.pretrained import PretrainedPipeline

#create or get Spark Session

spark = sparknlp.start()

sparknlp.version()
spark.version

#download, load, and annotate a text by pre-trained pipeline

pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')

2.1.0
recognize_entities_dl download started this may take some time.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
     11 #download, load, and annotate a text by pre-trained pipeline
     12 
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
     14 result = pipeline.annotate('Harry Potter is a great movie')

d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
     89 
     90     def __init__(self, name, lang='en', remote_loc=None):
---> 91         self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
     92         self.light_model = LightPipeline(self.model)
     93 

d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
     50     def downloadPipeline(name, language, remote_loc=None):
     51         print(name + " download started this may take some time.")
---> 52         file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
     53         if file_size == "-1":
     54             print("Can not find the model to download please check the name!")

AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'
python apache-spark johnsnowlabs-spark-nlp
1个回答
0
投票

感谢您确认您的Apache Spark版本。预先训练的管道和模型基于Apache Spark和Spark NLP版本。最低Apache Spark版本必须为2.4.x,才能下载经过预先​​训练的模型/管道。否则,您需要先针对任何版本训练自己的模型/管道。

这是所有管道的列表,它们全部用于Apache Spark 2.4.x:https://nlp.johnsnowlabs.com/docs/en/pipelines

如果您查看任何模型或管道的URL,您都可以看到以下信息:

recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip

  • 名称recognize_entities_dl
  • Langen
  • Spark NLP:必须等于2.1.0或更大
  • Apache Spark:等于2.4.x或更高

注意:正在根据Apache Spark 2.4.x构建和编译Spark NLP库。这就是为什么模型和管道仅适用于2.4.x版本的原因。

我希望这个答案能帮助您解决问题。

© www.soinside.com 2019 - 2024. All rights reserved.