我的NER变压器卡门培尔模型会怎样?

问题描述 投票:0回答:1

我曾尝试使用基于“卡门贝特”模型的法语将变压器用于NER。我从https://huggingface.co/transformers/usage.html碰到了这段代码。不幸的是,我的短句的预测结果并不令人满意,我无法理解我的代码是否有问题。

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer


model = AutoModelForTokenClassification.from_pretrained("camembert-base")
tokenizer = AutoTokenizer.from_pretrained("camembert-base")

label_list = [
    "O",       # Outside of a named entity
    "B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
    "I-MISC",  # Miscellaneous entity
    "B-PER",   # Beginning of a person's name right after another person's name
    "I-PER",   # Person's name
    "B-ORG",   # Beginning of an organisation right after another organisation
    "I-ORG",   # Organisation
    "B-LOC",   # Beginning of a location right after another location
    "I-LOC"    # Location
]

sequence = "Paris, capitale de la France, est une grande ville européenne et un centre mondial de l'art, de la mode, de la gastronomie et de la culture."

# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")

outputs = model(inputs)[0]
predictions = torch.argmax(outputs, dim=2)

print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].tolist())])

预测的输出:

[('<s>', 'O'), ('▁Paris', 'O'), (',', 'O'), ('▁capitale', 'B-MISC'), ('▁de', 'O'), ('▁la', 'B-MISC'), ('▁France', 'O'), (',', 'O'), ('▁est', 'B-MISC'), ('▁une', 'O'), ('▁grande', 'O'), ('▁ville', 'O'), ('▁européenne', 'O'), ('▁et', 'O'), ('▁un', 'O'), ('▁centre', 'O'), ('▁mondial', 'O'), ('▁de', 'O'), ('▁l', 'O'), ("'", 'O'), ('art', 'O'), (',', 'O'), ('▁de', 'O'), ('▁la', 'B-MISC'), ('▁mode', 'B-MISC'), (',', 'O'), ('▁de', 'O'), ('▁la', 'B-MISC'), ('▁gastronomie', 'O'), ('▁et', 'O'), ('▁de',
'O'), ('▁la', 'O'), ('▁culture', 'O'), ('.', 'O'), ('</s>', 'O')]`
named-entity-recognition ner huggingface-transformers bert
1个回答
0
投票

您应该搜索法语的https://huggingface.co/models?search=conll03

您可能仅为了检查是否已针对NER任务微调了模型而创建问题。

您的最后一个分类应具有

Linear(in_features=768, out_features=9, bias=True)结尾。

使用modeel "camembert-base",您只有2个输出功能。

© www.soinside.com 2019 - 2024. All rights reserved.