无法标记文本文件的POS

问题描述 投票:0回答:1

我想标记句子的词性。对于此任务,我使用 pos-english-fast 模型。如果有一个句子,模型就会识别出该位置的标签。我创建了一个数据文件,其中保存了所有句子。数据文件的名称是“data1.txt”。现在,如果我尝试在数据文件上标记句子,它不起作用。

我的代码

from flair.models import SequenceTagger
model = SequenceTagger.load("flair/pos-english")
#Read the data from the data.txt 
with open('data1.txt') as f:
  data = f.read().splitlines()
#Create a list of sentences from the data 
sentences = [sentence.split() for sentence in data]
#Tag each sentence using the model
tagged_sentences = []
for sentence in sentences:
  tagged_sentences.append(model.predict(sentence))
for sentence in tagged_sentences:
  print(sentence)

我收到的错误

AttributeError                            Traceback (most recent call last)
<ipython-input-16-03268ee0d9c9> in <cell line: 10>()
      9 tagged_sentences = []
     10 for sentence in sentences:
---> 11   tagged_sentences.append(model.predict(sentence))
     12 for sentence in tagged_sentences:
     13   print(sentence)

1 frames
/usr/local/lib/python3.10/dist-packages/flair/data.py in set_context_for_sentences(cls, sentences)
   1116         previous_sentence = None
   1117         for sentence in sentences:
-> 1118             if sentence.is_context_set():
   1119                 continue
   1120             sentence._previous_sentence = previous_sentence

AttributeError: 'str' object has no attribute 'is_context_set'

错误快照

我该如何解决?

python nlp tagging huggingface
1个回答
0
投票

假设这是您的数据:

['Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish.',
 "A commentary on body shaming and double standards placed upon young women's appearances, it features a monologue from Eilish about the media scrutiny surrounding her body.",
 'The film is spoken-word and stars Eilish in a dark room, where she gradually undresses before submerging herself in a black substance.']

这是在 Flair 中进行词性标注所需要做的事情:

from flair.data import Sentence
from flair.models import SequenceTagger

sentences = list(map(Sentence, data))
_ = model.predict(sentences)

现在所有句子都已正确标记。例如,如果您想可视化第一句的标签,只需使用

print(sentences[0])
。这是输出:

Sentence[17]: "Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish." →
["Not"/RB, "My"/PRP$, "Responsibility"/NN, "is"/VBZ, "a"/DT, "2020"/CD, "American"/JJ, "short"/JJ, "film"/NN, "written"/VBN, "and"/CC, "produced"/VBN, "by"/IN, "singer-songwriter"/NN, "Billie"/NNP, "Eilish"/NNP, "."/.]
``
© www.soinside.com 2019 - 2024. All rights reserved.