有人可以告诉我remove_punct_dict命令在做什么吗？最后一行命令的输出是什么？

Question

def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

Answer 1

[remove_punct_dict实际上是在string.punctuation中找到的所有标点的Unicode代码的dict集合，其值是None

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

它简单地分解为在字符串中的每个标点创建一个dict(ord(punct),None)，其中ord是python中的内置函数，用于返回对应字符的Unicode值。

让我们回顾一下最后一个功能：

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

此方法开始使给定的文本变为小写，然后删除文本中的标点符号并将其添加到相应的字典键。

例如hello world!现在将是hello world，并且将更新键“！”的值的remove_punct_dict。]

然后继续标记单词，因此现在有了Hello World和hello，而不是world。>

最后一个功能是将单词词干化为最简单的形式。您可以阅读更多有关阻止here的信息。 hello和world已经是使用Porter提取器提取的词，因此将保持不变。因此，我的示例的最终输出只是

[hello和world。

有人可以告诉我remove_punct_dict命令在做什么吗？最后一行命令的输出是什么？

问题描述投票：-1回答：1

1个回答

最新问题

有人可以告诉我remove_punct_dict命令在做什么吗？最后一行命令的输出是什么？

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1