从数据框中获取同义词

问题描述 投票:0回答:1

我有一个由{question,answer}组成的数据集,用于聊天机器人培训,我用熊猫加载它。我想用wordnet.synsets为每个问题中的每个单词找一个同义词包。我有一些困难,这是我尝试过的尝试。

import pandas  as pd`
import nltk.corpus
from nltk.corpus import stopwords, wordnet
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
df =pd.read_csv('healthtapQAs++.csv')
df['question']=df['question'].str.pad(width= i,side= 'left')
df['unpunctuated'] = df['question'].str.replace(r'[^\w\s]+', '')
df['tokenized'] = df['unpunctuated'].apply(word_tokenize) 
df['synonyms'] = df['tokenized'].apply(lambda x: [wordnet.synsets(y) for y 
in x])
df['synonyms_beta'] = df['synonyms'].apply( lambda x:[(y[0].name()) for y in 
x])`

这是我不断得到的错误类型

>   df['synonyms_beta'] = df['synonyms'].apply( lambda x:[(y[0].name()) for y in x])

IndexError: list index out of range
python pandas dataframe wordnet synset
1个回答
0
投票

你可以尝试:

df['synonyms_beta'] = df['synonyms'].apply( lambda x:[(y[0].name()) if len(y) >0 else "no_syn" for y in x])
© www.soinside.com 2019 - 2024. All rights reserved.