过滤器熊猫，其中某些列包含列表中的任何单词

Question

我想过滤一个数据框。结果数据框应包含所有行，其中许多列中的任何一个都包含列表的任何单词。

我开始使用for循环，但是应该有更好的pythonic / pandonic方法。

示例：

# importing pandas 
import pandas as pd 

# Creating the dataframe with dict of lists 
df = pd.DataFrame({'Name': ['Geeks', 'Peter', 'James', 'Jack', 'Lisa'], 
                   'Team': ['Boston', 'Boston', 'Boston', 'Chele', 'Barse'], 
                   'Position': ['PG', 'PG', 'UG', 'PG', 'UG'], 
                   'Number': [3, 4, 7, 11, 5], 
                   'Age': [33, 25, 34, 35, 28], 
                   'Height': ['6-2', '6-4', '5-9', '6-1', '5-8'], 
                   'Weight': [89, 79, 113, 78, 84], 
                   'College': ['MIT', 'MIT', 'MIT', 'Stanford', 'Stanford'], 
                   'Salary': [99999, 99994, 89999, 78889, 87779]}, 
                   index =['ind1', 'ind2', 'ind3', 'ind4', 'ind5']) 


df1 = df[df['Team'].str.contains("Boston") | df['College'].str.contains('MIT')] 
print(df1)

因此很清楚如何分别过滤包含特定单词的列

此外，还很清楚如何过滤包含列表的任何字符串的每列的行：

df[df.Name.str.contains('|'.join(search_values ))]

其中search_values包含单词或字符串的列表。

search_values = ['boston','mike','whatever']

我正在寻找一种简短的编码方法

#pseudocode
give me a subframe of df where any of the columns 'Name','Position','Team' contains any of the words in search_values

我知道我可以做到

df[df['Name'].str.contains('|'.join(search_values )) | df['Position'].str.contains('|'.join(search_values )) | df['Team'].contains('|'.join(search_values )) ]

但是如果我想要20列，那将是一行代码的混乱

任何建议？

编辑奖金：当查看列列表时，即“名称”，“位置”，“团队”如何同时包含索引？传递['index'，'Name'，'Position'，'Team']无效。

谢谢。

我看了一下：https://www.geeksforgeeks.org/get-all-rows-in-a-pandas-dataframe-containing-given-substring/

https://kanoki.org/2019/03/27/pandas-select-rows-by-condition-and-string-operations/

Filter out rows based on list of strings in Pandas

Answer 1

3
投票

您也可以将stack上的any与level=0：

Answer 2

3
投票

用apply做any

Answer 3

1
投票

在这种情况下，您可以简单地apply：

过滤器熊猫，其中某些列包含列表中的任何单词

问题描述投票：2回答：3

3个回答

最新问题

过滤器熊猫，其中某些列包含列表中的任何单词

问题描述 投票：2回答：3

3个回答

最新问题

问题描述投票：2回答：3