POS后使用Wordnet将熊猫列合法化

Question

我有一个带有文本的熊猫列df_travail[line_text]。

我想对本专栏中的每个词进行词法化。

首先我将文本小写：

df_travail ['lowercase'] = df_travail['line_text'].str.lower()

然后，我将其标记化并应用POS（因为wordnet默认配置将每个单词都视为名词）。

from nltk import word_tokenize, pos_tag
tok_and_tag = lambda x: pos_tag(word_tokenize(x))
df_travail ['tok_and_tag'] = df_travail['lowercase'].apply(tok_and_tag)

然后我有以下内容：（整个df_travail['tok_and_tag']的摘录

"[('so', 'RB'), ('you', 'PRP'), (""'ve"", 'VBP'), ('come', 'VBN'), ('to', 'TO'), ('the', 'DT'), ('master', 'NN'), ('for', 'IN'), ('guidance', 'NN'), ('?', '.'), ('is', 'VBZ'), ('this', 'DT'), ('what', 'WP'), ('you', 'PRP'), (""'re"", 'VBP'), ('saying', 'VBG'), (',', ','), ('grasshopper', 'NN'), ('?', '.')]"
[('actually', 'RB'), (',', ','), ('you', 'PRP'), ('called', 'VBD'), ('me', 'PRP'), ('in', 'IN'), ('here', 'RB'), (',', ','), ('but', 'CC'), ('yeah', 'UH'), ('.', '.')]

但是，考虑到我应用了POS的事实，我迷失了要使用（与Wordnet一起）应用的词形化功能？

编辑：以下链接未提及我的问题的POS部分Lemmatization of all pandas cells

Answer 1

尝试以下示例：

from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

adjective_tags = ['JJ','JJR','JJS']

def convert(text):
    lemmatized_text = []

    for word in POS_tag:
        if word[1] in adjective_tags:
            lemmatized_text.append(str(wordnet_lemmatizer.lemmatize(word[0],pos="a")))
        else:
            lemmatized_text.append(str(wordnet_lemmatizer.lemmatize(word[0]))) #default POS = noun

    return ' '.join(lemmatized_text)

df['text'] = df['text'].apply(lambda x: convert(x))

POS后使用Wordnet将熊猫列合法化

问题描述投票：0回答：1

1个回答

最新问题

POS后使用Wordnet将熊猫列合法化

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1