我想标记句子的词性。对于此任务,我使用 pos-english-fast 模型。如果有一个句子,模型就会识别出该位置的标签。我创建了一个数据文件,其中保存了所有句子。数据文件的名称是“data1.txt”。现在,如果我尝试在数据文件上标记句子,它不起作用。
我的代码
from flair.models import SequenceTagger
model = SequenceTagger.load("flair/pos-english")
#Read the data from the data.txt
with open('data1.txt') as f:
data = f.read().splitlines()
#Create a list of sentences from the data
sentences = [sentence.split() for sentence in data]
#Tag each sentence using the model
tagged_sentences = []
for sentence in sentences:
tagged_sentences.append(model.predict(sentence))
for sentence in tagged_sentences:
print(sentence)
我收到的错误
AttributeError Traceback (most recent call last)
<ipython-input-16-03268ee0d9c9> in <cell line: 10>()
9 tagged_sentences = []
10 for sentence in sentences:
---> 11 tagged_sentences.append(model.predict(sentence))
12 for sentence in tagged_sentences:
13 print(sentence)
1 frames
/usr/local/lib/python3.10/dist-packages/flair/data.py in set_context_for_sentences(cls, sentences)
1116 previous_sentence = None
1117 for sentence in sentences:
-> 1118 if sentence.is_context_set():
1119 continue
1120 sentence._previous_sentence = previous_sentence
AttributeError: 'str' object has no attribute 'is_context_set'
我该如何解决?
假设这是您的数据:
['Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish.',
"A commentary on body shaming and double standards placed upon young women's appearances, it features a monologue from Eilish about the media scrutiny surrounding her body.",
'The film is spoken-word and stars Eilish in a dark room, where she gradually undresses before submerging herself in a black substance.']
这是在 Flair 中进行词性标注所需要做的事情:
from flair.data import Sentence
from flair.models import SequenceTagger
sentences = list(map(Sentence, data))
_ = model.predict(sentences)
现在所有句子都已正确标记。例如,如果您想可视化第一句的标签,只需使用
print(sentences[0])
。这是输出:
Sentence[17]: "Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish." →
["Not"/RB, "My"/PRP$, "Responsibility"/NN, "is"/VBZ, "a"/DT, "2020"/CD, "American"/JJ, "short"/JJ, "film"/NN, "written"/VBN, "and"/CC, "produced"/VBN, "by"/IN, "singer-songwriter"/NN, "Billie"/NNP, "Eilish"/NNP, "."/.]
``