我在下面有一个示例 DF(我会共享代码,但我无法加载到列中来复制问题,即对象列 A 中的 int 和字符串)。
我只想过滤并返回具有以下值的 df ,但问题是当我这样做时 - 它返回一个空数据框,尽管它具有这些值。我尝试将其全部转换为字符串,但空数据框的结果相同。有什么想法吗?
这是我打印 df 时得到的
COL_A COL_B
2A001 Red
75101 Orange
75102 Grey
75104 Pink
-> COL_A dtype = 对象
filters_string = ['75101', '75102', '75103', '75104']
filters_int = [75101, 75102, 75103, 75104]
方法一:
df['COL_A'] = df['COL_A'].astype(str)
# Filter for column in the target list
filtered_df = df[df['COL_A'].isin(filters_string)]
方法2:
df['COL_A'] = df['COL_A'].astype(str)
# Filter for column in the target list
filtered_df = df[df['COL_A'].isin(filters_int)]
它们都返回空数据帧?
您的代码应该可以正常工作。也许您的代码返回一个空数据框,因为
'COL_A'
可能有尾随或前导 ' '
(空格),但如果不查看您的数据,就不可能确定。
下面的代码与您的原始实现类似。两个主要区别是,下面的实现考虑了尾随和前导空格,在将
'COL_A'
与 filters_string
列表进行比较之前将其删除,并且过滤了 'COL_A'
OR
filters_string
中存在的 filters_int
值列出。
import pandas as pd
# Lists of values to use for filtering the 'COL_A' column on our dataframe
filters_string = ['75101', '75102', '75103', '75104']
filters_int = [75101, 75102, 75103, 75104]
# Creating the dataframe to test our logic
df = pd.DataFrame(
[
["2A001", "Red"],
[" 75101", "Orange"],
[" 75102 ", "Grey"],
[" 75104", "Pink"],
],
columns = ["COL_A", "COL_B"]
)
# Filter 'COL_A' column for values that are on either `filters_string` OR `filters_int` lists.
# NOTE: The `|` operator is the pandas equivalent to `OR`.
# Before comparing 'COL_A' with the `filters_string` list, we convert all its values to strings.
filtered_df = df.loc[(df['COL_A'].astype(str).str.strip().isin(filters_string)) | (df['COL_A'].isin(filters_int)), :]
filtered_df
# Returns:
#
# COL_A COL_B
# 1 75101 Orange
# 2 75102 Grey
# 3 75104 Pink