我在下面的panads数据帧中,要在其中比较一列的列表对象(列表中的名称)与另一列中的整数值。
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0722882 | Shivam Verma | RD-DI-Infra-Linux | Karn Kumar | Active | IN-NDA02 | 2 |
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0790029 | Akhil Garg | RD-DI-Infra-Storage | Amit Raj | Awaiting User Info | IN-NDA02 | 3 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
from tabulate import tabulate
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##########################################################################################
def pprint_df(dframe):
print(tabulate(dframe, headers='keys', tablefmt='psql', showindex=False))
names = ['Amit Raj','Andre Geurts','Andrzej Kamionek','Ankur Wason','Ashish Kumar','Carl Thijssen','Chris Masson','Daniel Chorazy','Devarishi Kumar','Elizabeth Tamayo','Eric Oomen','Gopinath Perumal','Jakub Kubera','Jeffrey Thompson','Jeroen Kwanten','Karn Kumar','Kenny Henderson','Manish Kumar','Mihai Pârlea','Mihai Reus','Naveen Kumar','Rafiq Khan','Rob Goossens','Robert in','Roger Smith','Santhoshkumar Krishnamoorthy','Shakir Chaudhry','Sonu Kumar','Suraj Budha','Szymon Kolodziejski','Szymon Kubera','Tony Olsson','Vetrivelan Rajagopalan','Yogesh Miglani','Abrar Ahmad']
col_name = ['Number','Caller','Assignment group','Assigned to','Status(state)','Location','Aging']
df = pd.read_excel('Backlog-April_24.xlsx', usecols=col_name, encoding='utf-8', index=False)
# df = df[df['Assigned to'].isin(names)] <-- This works perfectly with above dataframe
df = df[df['Assigned to'].isin(names) & df['Aging'] >= 5]
print(df.dtypes)
pprint_df(df)
当我运行上面的代码时,即使将int转换为str
,我也没有得到结果。
$ ./pd_code.py
Number object
Caller object
Assignment group object
Assigned to object
Status(state) object
Location object
Aging object
dtype: object
+----------+----------+--------------------+---------------+-----------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|----------+----------+--------------------+---------------+-----------------+------------+---------|
+----------+----------+--------------------+---------------+-----------------+------------+---------+
示例:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
在(df['Aging'] >= 5)
上加上括号有效,但我不见了,Ben.T提供了提示。
df = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]
为了后代,我们需要使用布尔索引...
另一个常见的操作是使用布尔向量来过滤数据。运算符是:或的|
,和的&
,非的~
。这些必须通过使用括号进行分组。
df = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]
OR
df = df[(df['Assigned to'].isin(names)) & (df['Aging'] >= 5)]
还有关于operator's precedence的非常详细的内容,值得阅读。