我到处都看到这个与包相关的示例
LanguageDetector
import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector
def get_lang_detector(nlp, name):
return LanguageDetector()
nlp = spacy.load("en_core_web_sm")
Language.factory("language_detector", func=get_lang_detector)
nlp.add_pipe('language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)
但是,如果之前的代码总是只加载英文模块,如何根据检测到的语言加载正确的语言模块呢?
我想要类似的东西
languageCode = LanguageDetector.detect('This is a text example')
nlp = spacy.load(languageCode.lower() + "_core_web_sm")
如果您不限于仅使用
spacy
,则可以使用 lingua-language-detector
库 来首先检索语言本身。
在这里,您可以找到 SpaCy 上所有可用语言的完整列表。所以你可以建立一个字典如下(包括你想要的多种语言):
spacy_model_mapping = {
"english": "en_core_web_sm",
"french": "fr_core_web_sm",
"german": "de_core_web_sm",
"spanish": "es_core_web_sm",
"portuguese": "pt_core_news_sm",
"italian": "it_core_news_sm",
"dutch": "nl_core_news_sm",
}
操作步骤如下:
import spacy
from lingua import Language, LanguageDetectorBuilder
# Languages
supported_languages = [Language.ENGLISH, Language.FRENCH, Language.GERMAN, Language.SPANISH, Language.PORTUGUESE, Language.ITALIAN, Language.DUTCH]
detector = LanguageDetectorBuilder.from_languages(*supported_languages).build()
text = "Ceci est un texte en français."
result = detector.detect_language_of(text)
detected_language_name = result.name.lower()
spacy_model_name = spacy_model_mapping.get(detected_language_name)
print("SpaCy model name:", spacy_model_name)
获得:
>>> SpaCy model name: fr_core_web_sm
最终:
nlp = spacy.load(spacy_model_name)