使用文本提取引语

问题描述 投票:0回答:2

我试图从文本中提取引语和引语属性(即说话者),但出现错误。这是设置:

import textacy
import pandas as pd
import spacy

data = [
        ("\"Hello, nice to meet you,\" said world 1"),
        ("\"Hello, nice to meet you,\" said world 2"),  
        ]

df = pd.DataFrame(data, columns=['text'])

nlp = spacy.load('en_core_web_sm')

doc = df['text'].apply(nlp)

这是所需的输出:

[DQTriple(speaker=[world 1], cue=[said], content="你好,很高兴认识你,")] [DQTriple(speaker=[world 2], cue=[said], content="你好, 很高兴认识你,")]

这是第一次提取尝试:

print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))

给出以下输出:

[, ]

这是第二次提取尝试:

print(list(textacy.extract.triples.direct_quotations(doc)))

给出以下错误:

AttributeError: 'Series' 对象没有属性 'lang_'

python spacy textacy
2个回答
0
投票

在您的第一次尝试中,您是通过遍历标记来提取引语。

以下是您可以执行的操作的示例:

import textacy

import spacy

text =""" "Hello, nice to meet you," said world 1"""

nlp = spacy.load("en_core_web_sm")

doc = nlp(text)

print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]

0
投票

你必须使用

next(textacy.extract.triples.direct_quotations(doc)) 

因为它是一个生成器对象。

© www.soinside.com 2019 - 2024. All rights reserved.