如何应用huggingface中的预训练变压器模型？

Question

我有兴趣使用 Hugging Face 的预训练模型来执行命名实体识别 (NER) 任务，而无需进一步训练或测试模型。

Hugging Face的模型页面，模型复用的唯一信息如下：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

我尝试了以下代码，但我得到的是张量输出，而不是每个命名实体的类标签。

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

text = "my text for named entity recognition here."

input_ids = torch.tensor(tokenizer.encode(text, padding=True, truncation=True,max_length=50, add_special_tokens = True)).unsqueeze(0)

with torch.no_grad():
  output = model(input_ids, output_attentions=True)

关于如何将模型应用于 NER 文本，有什么建议吗？

Answer 1

您正在寻找命名实体识别管道（令牌分类）：

from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModelForTokenClassification.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer)
text = "my text for named entity recognition here."
nerpipeline(text)

输出：

[{'word': 'my',
  'score': 0.5209763050079346,
  'entity': 'LABEL_0',
  'index': 1,
  'start': 0,
  'end': 2},
 {'word': 'text',
  'score': 0.5161970257759094,
  'entity': 'LABEL_0',
  'index': 2,
  'start': 3,
  'end': 7},
 {'word': 'for',
  'score': 0.5297629237174988,
  'entity': 'LABEL_1',
  'index': 3,
  'start': 8,
  'end': 11},
 {'word': 'named',
  'score': 0.5258920788764954,
  'entity': 'LABEL_1',
  'index': 4,
  'start': 12,
  'end': 17},
 {'word': 'entity',
  'score': 0.5415489673614502,
  'entity': 'LABEL_1',
  'index': 5,
  'start': 18,
  'end': 24},
 {'word': 'recognition',
  'score': 0.5396601557731628,
  'entity': 'LABEL_1',
  'index': 6,
  'start': 25,
  'end': 36},
 {'word': 'here',
  'score': 0.5165827870368958,
  'entity': 'LABEL_0',
  'index': 7,
  'start': 37,
  'end': 41},
 {'word': '.',
  'score': 0.5266348123550415,
  'entity': 'LABEL_0',
  'index': 8,
  'start': 41,
  'end': 42}]

请注意，您需要使用

AutoModelForTokenClassification

而不是

AutoModel

，并且并非所有模型都有经过训练的标记分类头（即您将获得标记分类头的随机权重）。

Answer 2

我正在尝试类似的过程，因为除了在管道中使用模型之外，我还想保存模型。

from transformers import AutoModel, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

model = AutoModel.from_pretrained("facebook/bart-large-mnli")

model_name = "./tmp/"
model.save_pretrained(model_name)
tokenizer.save_pretrained(model_name)

zss = pipeline('zero-shot-classification', model=model_name, tokenizer=model_name)
zss("I am happy the way I am!!", ["happy", "sad", "netural"])

结果如下：

{'sequence': 'I am happy the way I am!!',
 'labels': ['happy', 'netural', 'sad'],
 'scores': [0.37897560000419617, 0.311644583940506, 0.30937978625297546]}

发表于 2022 年 12 月：ChatGpt 无法击败人类智能！ - 我的工作仍然安全...

如何应用huggingface中的预训练变压器模型？

问题描述投票：0回答：2

2个回答

最新问题

如何应用huggingface中的预训练变压器模型？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2