根据检测到的语言加载Spacy语言模块

问题描述 投票:0回答:1

我到处都看到这个与包相关的示例

LanguageDetector

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

def get_lang_detector(nlp, name):
    return LanguageDetector()

nlp = spacy.load("en_core_web_sm")
Language.factory("language_detector", func=get_lang_detector)
nlp.add_pipe('language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)

但是,如果之前的代码总是只加载英文模块,如何根据检测到的语言加载正确的语言模块呢?

我想要类似的东西

languageCode = LanguageDetector.detect('This is a text example')
nlp = spacy.load(languageCode.lower() + "_core_web_sm")
spacy spacy-3
1个回答
0
投票

如果您不限于仅使用

spacy
,则可以使用
lingua-language-detector
来首先检索语言本身。

在这里,您可以找到 SpaCy 上所有可用语言的完整列表。所以你可以建立一个字典如下(包括你想要的多种语言):

spacy_model_mapping = {
    "english": "en_core_web_sm",
    "french": "fr_core_web_sm",
    "german": "de_core_web_sm",
    "spanish": "es_core_web_sm",
    "portuguese": "pt_core_news_sm",
    "italian": "it_core_news_sm",
    "dutch": "nl_core_news_sm",
}

操作步骤如下:

import spacy
from lingua import Language, LanguageDetectorBuilder
    
# Languages 
supported_languages = [Language.ENGLISH, Language.FRENCH, Language.GERMAN, Language.SPANISH, Language.PORTUGUESE, Language.ITALIAN, Language.DUTCH]
detector = LanguageDetectorBuilder.from_languages(*supported_languages).build()

text = "Ceci est un texte en français."

result = detector.detect_language_of(text)
detected_language_name = result.name.lower()  

spacy_model_name = spacy_model_mapping.get(detected_language_name)
print("SpaCy model name:", spacy_model_name)

获得:

>>> SpaCy model name: fr_core_web_sm

最终:

nlp = spacy.load(spacy_model_name)
© www.soinside.com 2019 - 2024. All rights reserved.