我试图从文本中提取引语和引语属性(即说话者),但出现错误。这是设置:
import textacy
import pandas as pd
import spacy
data = [
("\"Hello, nice to meet you,\" said world 1"),
("\"Hello, nice to meet you,\" said world 2"),
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)
这是所需的输出:
[DQTriple(speaker=[world 1], cue=[said], content="你好,很高兴认识你,")] [DQTriple(speaker=[world 2], cue=[said], content="你好, 很高兴认识你,")]
这是第一次提取尝试:
print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))
给出以下输出:
[
, ]
这是第二次提取尝试:
print(list(textacy.extract.triples.direct_quotations(doc)))
给出以下错误:
AttributeError: 'Series' 对象没有属性 'lang_'
在您的第一次尝试中,您是通过遍历标记来提取引语。
以下是您可以执行的操作的示例:
import textacy
import spacy
text =""" "Hello, nice to meet you," said world 1"""
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]
你必须使用
next(textacy.extract.triples.direct_quotations(doc))
因为它是一个生成器对象。