如何从主字符串主体中删除引用列表中的字符

问题描述 投票:0回答:1

我正在处理一列df文本,并且我试图计算频率最高的单词,但是偏离某些单词,例如“ for”,“ and”,“ the” .. etc等。主导结果。我试图创建一个for循环来删除这些单词,以免在我的分析中造成干扰。下面是我正在生成的代码;

 lst= ["for", "of", "and", "in", "which", "the", "to", "a", "an"]


for i in papers.title_processed:
    if i in lst:
        papers.title_processed=  papers.title_processed.replace(i, "")


output: 
0    Self-Organization of Associative Database and ...
1    A Mean Field Theory of Layer IV of Visual Cort...
2    Storing Covariance by the Associative Long-Ter...
3    Bayesian Query Construction for Neural Network...
4    Neural Network Ensembles, Cross Validation, an...
Name: title, dtype: object
0    self-organization of associative database and ...
1    a mean field theory of layer iv of visual cort...
2    storing covariance by the associative long-ter...
3    bayesian query construction for neural network...
4    neural network ensembles, cross validation, an...
Name: title_processed, dtype: object

所以它什么也没做。有什么建议我做错了吗?我试过.map(lambda x: papers.title_processed.str.replace(x, "") for x in lst)并出现错误

python str-replace
1个回答
0
投票

用途:

import re

lst= ["for", "of", "and", "in", "which", "the", "to", "a", "an"]

regex = re.compile('|'.join([rf'\b{w}\b' for w in lst]))
papers['title_processed'] = papers['title_processed'].str.replace(regex, '')

lst中删除单词后,title_processed系列应如下所示:

# print(papers['title_processed'])

0       self-organization  associative database  ...
1        mean field theory  layer iv  visual cort...
2     storing covariance by  associative long-ter...
3     bayesian query construction  neural network...
4    neural network ensembles, cross validation, ...
Name: title_processed, dtype: object
© www.soinside.com 2019 - 2024. All rights reserved.