结合使用 WordNet 和 nltk 来查找有意义的同义词

问题描述 投票:0回答:2

我想输入一个句子,然后输出一个句子,其中困难的单词变得更简单。

我正在使用 Nltk 来标记句子和标记单词,但我在使用 WordNet 查找我想要的单词的特定含义的同义词时遇到困难。

例如:

输入: “我拒绝去捡拒绝

也许拒绝#1是最简单的拒绝词,但拒绝#2意味着垃圾,还有更简单的词可以用在那里。

Nltk 或许能够将拒绝 #2 标记为名词,但是如何从 WordNet 获取拒绝(垃圾)的同义词?

python nltk wordnet
2个回答
4
投票

听起来你想要基于单词词性的单词同义词(即名词、动词等)

Follows 根据词性为句子中的每个单词创建同义词。 参考资料:

  1. 在 NLTK 3.0 中使用 Wordnet 从 Synset 中提取单词
  2. 打印词性以及单词的同义词

代码

import nltk; nltk.download('popular') 
from nltk.corpus import wordnet as wn

def get_synonyms(word, pos):
  ' Gets word synonyms for part of speech '
  for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
    for lemma in synset.lemmas():
        yield lemma.name()

def pos_to_wordnet_pos(penntag, returnNone=False):
   ' Mapping from POS tag word wordnet pos tag '
    morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
                  'VB':wn.VERB, 'RB':wn.ADV}
    try:
        return morphy_tag[penntag[:2]]
    except:
        return None if returnNone else ''

用法示例

# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")

for word, tag in nltk.pos_tag(text):
  print(f'word is {word}, POS is {tag}')

  # Filter for unique synonyms not equal to word and sort.
  unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))

  for synonym in unique:
    print('\t', synonym)

输出

注意基于 POS 的拒绝同义词的不同集合。

word is I, POS is PRP
word is refuse, POS is VBP
     decline
     defy
     deny
     pass_up
     reject
     resist
     turn_away
     turn_down
word is to, POS is TO
word is pick, POS is VB
     beak
     blame
     break_up
     clean
     cull
     find_fault
     foot
     nibble
     peck
     piece
     pluck
     plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
     food_waste
     garbage
     scraps

0
投票

对于那些不习惯使用 Python 编码的人,您还可以在这里从 Wordnet 获取每个单词的同义词和反义词:https://wordsplayground.org/

© www.soinside.com 2019 - 2024. All rights reserved.