Python NLP:使用 TextBlob、StanfordNLP 或 Google Cloud 识别句子的时态

问题描述 投票:0回答:2

(注:我知道以前有关于这个问题的帖子(例如herehere,但它们相当老了,我认为过去几年NLP已经取得了相当大的进展。)

我正在尝试使用 Python 中的自然语言处理来确定句子的时态。

有一个易于使用的软件包吗?如果没有,我需要如何在 TextBlob、StanfordNLP 或 Google Cloud Natural Language API 中实施解决方案?

TextBlob 似乎最容易使用,并且我设法列出了 POS 标签,但我不确定如何将输出转换为“时态预测值”或只是对时态的最佳猜测。此外,我的文本是西班牙语,所以我更喜欢使用支持西班牙语的 GoogleCloud 或 StanleyNLP(或任何其他易于使用的解决方案)。

我还没有成功使用斯坦福自然语言处理的Python接口。

Google Cloud Natural Language API 似乎完全提供了我所需要的内容(请参阅here,但我还没有找到如何获得此输出。我已使用 Google Cloud NLP 进行其他分析(例如实体情感分析)它已经起作用了,所以我相信如果我找到正确的使用示例,我可以设置它。

文本块示例:

from textblob import TextBlob
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob = TextBlob("I am curious to see whether NLP is able to predict the tense of this sentence., pos_tagger=nltk_tagger)
print(blob.pos_tags)

-> 这会打印 pos 标签,我如何将它们转换为对这句话时态的预测?

Google Cloud NLP 示例(设置凭据后):

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
text = "I am curious to see how this works"
client = language.LanguageServiceClient()
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

tense = (WHAT NEEDS TO COME HERE?)
print(tense)

-> 我不确定需要输入什么代码来预测时态(代码中标明)

我是 Python 的新手,因此我们将非常感谢有关此主题的任何帮助!谢谢!

python google-cloud-platform nlp stanford-nlp part-of-speech
2个回答
9
投票

我不认为任何 NLP 工具包具有立即检测过去时态的功能。但你可以简单地从依赖解析和词性标记中获取它。

对句子进行依存分析,查看句子的主要谓词根及其词性标记。如果是

VBD
(动词是过去简单形式),那么它肯定是过去时。如果是
VB
(基本形式)或
VBG
(动名词),则需要检查其依存子项,并检查是否存在带有
aux
标签的助动词(deprel 为
VBD
)。

如果您还需要涵盖现在/过去完成时或过去的模型表达式(我一定有......),您可以扩展条件。

spacy(我最喜欢的Python NLP 工具包)中,你可以这样写(假设你的输入是一个句子):

import spacy
nlp = spacy.load('en_core_web_sm')

def detect_past_sentece(sentence):
    sent = list(nlp(sentence).sents)[0]
    return (
        sent.root.tag_ == "VBD" or
        any(w.dep_ == "aux" and w.tag_ == "VBD" for w in sent.root.children))

使用Google Cloud API或StanfordNLP,基本上是一样的,只是我对API不太熟悉。


0
投票

我与chatgpt合作对此进行了编码(纠正它并以我永远无法弄清楚的方式推进了一堆它。到目前为止,在包含的测试中它工作得很好,但它有一些问题并且可以使用一些帮助。

该代码允许检测句子的主要时态(过去、现在、将来、未知)以及嵌入/从属子句的主要时态。我希望它能够帮助调整一个单独的语音到文本项目的时间 - 对于像“Jamie 想在 3 小时内吃东西”这样的句子,它是现在时,但引用的时间是将来的时间。

大多数预测测试实际上都适用于我的时间调整项目,因此我将保留这些测试,但其他一些测试会失败,我不知道如何处理。例如,“她想睡觉。” “她想在 3 小时后睡觉。”我希望嵌入的子句(分别)存在于现在和将来。 (当前代码将其视为“未知”)。 Screencap of part of the tests' output

我在想,如果主句存在,并且嵌入的内容未知,我可以将它放在将来,但我希望它能够处理语法,而不仅仅是最后的“未知”(除非这就是全部)需要)。

这是当前的代码。 (请注意,bansi 模块用于术语颜色代码,位于此处: https://gist.github.com/jaggzh/35b3705327ad9b4a3439014b8153384e

#!/usr/bin/env python3
import spacy
from tabulate import tabulate
from bansi import *
import sys

nlp = spacy.load("en_core_web_sm")

def pe(*x, **y):
    print(*x, **y, file=sys.stderr)

def detect_tense(sentence):
    sent = list(nlp(sentence).sents)[0]
    root_tag = sent.root.tag_
    aux_tags = [w.tag_ for w in sent.root.children if w.dep_ == "aux"]
    # Detect past tense
    if root_tag == "VBD" or "VBD" in aux_tags:
        return "past"
    # Detect present tense
    if root_tag in ["VBG", "VBP", "VBZ"] or ("VBP" in aux_tags or "VBZ" in aux_tags):
        return "present"
    # Detect future tense (usually indicated by the auxiliary 'will' or 'shall')
    if any(w.lower_ in ["will", "shall"] for w in sent.root.children if w.dep_ == "aux"):
        return "future"
    return "unknown"

def extract_subtree_str(token):
    return ' '.join([t.text for t in token.subtree])

def detect_embedded_tense(sentence):
    doc = nlp(sentence)
    main_tense = "unknown"
    embedded_tense = "unknown"
    for sent in doc.sents:
        root = sent.root
        main_tense = detect_tense(sentence) # Detect main clause tense
        for child in root.children:     # Detect embedded clause tense
            if child.dep_ in ["xcomp", "ccomp", "advcl"]:
                clause = extract_subtree_str(child)
                embedded_tense = detect_tense(clause)
    return main_tense, embedded_tense

def show_parts(sentence):
    doc = nlp(sentence)
    words = [''] + [str(token) for token in doc]
    tags = ['pos'] + [token.tag_ for token in doc]
    deps = ['dep'] + [token.dep_ for token in doc]
    print(tabulate([words, tags, deps]))
# def get_verb_tense(sentence):
#     doc = nlp(sentence)
#     for token in doc:
#         print(f"  tag_: {token.tag_}")
#         if "VERB" in token.tag_:
#             return token.tag_
#     return "No verb found"

if __name__ == '__main__':
    # Test the function
    sentences = [
        # (sentence, main_clause_expected_tense, embedded_clause_expected_tense)
        ("I ate an apple.", "past", "unknown"),
        ("I had eaten an apple.", "past", "unknown"),
        ("I am eating an apple.", "present", "unknown"),
        ("She needs to sleep at 4.", "present", "future"),
        ("She needed to sleep at 4.", "past", "past"),
        ("I ate an apple.", "past", "unknown"),
        ("I had eaten an apple.", "past", "unknown"),
        ("I am eating an apple.", "present", "unknown"),
        ("I eat an apple.", "present", "unknown"),
        ("I have been eating.", "present", "unknown"),
        ("I will eat an apple.", "future", "unknown"),
        ("I shall eat an apple.", "future", "unknown"),
        ("She will eat at 3.", "future", "unknown"),
        ("She ate at 3.", "past", "unknown"),
        ("She went to sleep at 4.", "past", "unknown"),
        ("She has to eat.", "future", "unknown"),
        ("She wants to go sleep.", "present", "future"),  # This could be debated
        ("She wants to go sleep in 3 hours.", "present", "future"),  # This could be debated
        ("She wanted to go sleep earlier.", "past", "past"),
        ("I want to be sleeping.", "present", "future"),  # This could be debated
        ("I am sleeping.", "present", "unknown"),
        ("She is eating.", "present", "unknown"),
    ]
    for s, exp_main_tense, exp_embedded_tense in sentences:
        print(f"{bgblu}{yel}-------------------------------------- {rst}")
        print(f"{bgblu}{yel} Sent: {s}{rst}")
        show_parts(s)
        det_main_tense, det_embedded_tense= detect_embedded_tense(s)
        print(f"   Main Pred-Tense: {yel}{det_main_tense}{rst}")
        print(f"   Main  Exp-Tense: {yel}{exp_main_tense}{rst}")
        if det_main_tense== exp_main_tense:
            print(f"                    {bgre}MATCH{rst}")
        else:
            print(f"                    {bred}MISMATCH{rst}")
        print(f"   Embedded Pred-Tense: {yel}{det_embedded_tense}{rst}")
        print(f"   Embedded  Exp-Tense: {yel}{exp_embedded_tense}{rst}")
        if det_embedded_tense== exp_embedded_tense:
            print(f"                        {bgre}MATCH{rst}")
        else:
            print(f"                        {bred}MISMATCH{rst}")
© www.soinside.com 2019 - 2024. All rights reserved.