我有一个名为 merged_df 的数据框,它具有以下所有非空对象的格式:
merged_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 889 entries, 0 to 888
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 UNIQUE_ID 889 non-null object
1 REGISTERED_NAME 889 non-null object
2 EMAIL 889 non-null object
3 DBS_CHECK_DATE 889 non-null object
4 EXPIRY_DATE 889 non-null object
5 UNIQUE_ID 889 non-null object
6 Status 889 non-null object
dtypes: object(7)
memory usage: 48.7+ KB
目标只是添加一个新列 - 它基于涉及 EXPIRY_DATE 列和 Status 的一组条件
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
def current_month():
return datetime.datetime.now().strftime("%m/%Y")
def get_current_date():
return datetime.datetime.now().strftime("%d/%m/%Y")
def three_month_ahead():
current = datetime.datetime.now()
three_month = current + relativedelta(months=3)
return three_month.strftime("%m/%Y")
def next_month_expiry():
current = datetime.datetime.now()
nextmonth = current + relativedelta(months=1)
return nextmonth.strftime("%m/%Y")
def year_ahead():
current = datetime.datetime.now()
year_on = current + relativedelta(months = 12)
return year_on.strftime("%d/%m/%Y")
merged_df['Action'] = '' # the column to which I apply the below function to
def action_col(row):
expiry_date = row['EXPIRY_DATE']
unique_id = row['UNIQUE_ID']
three_months_ahead = pd.to_datetime(three_month_ahead(), format='%m/%Y')
next_month = pd.to_datetime(next_month_expiry(), format='%m/%Y')
current_date_today = pd.to_datetime(get_current_date(), format='%d/%m/%Y')
year_on = pd.to_datetime(year_ahead(), format='%d/%m/%Y')
if f"{expiry_date.year}-{expiry_date.month:02}" == f"{three_months_ahead.year}-{three_months_ahead.month:02}":
return 'Send 3 month request'
elif expiry_date.month == next_month.month and expiry_date.year == next_month.year:
return 'Send 1 month reminder'
elif expiry_date < current_date_today and row['Status'] == "Not Suspended":
return 'DBS expired: Suspend & update iAdmin notes'
elif (year_on < expiry_date < current_date_today) and row['Status'] == "Suspended":
return 'No action needed - correct suspensions in place'
elif unique_id in SAP_only_EAs:
return 'No action needed - SAP only assessor'
elif (expiry_date < year_on) and row['Status'] == "Suspended":
return 'DBS expired for over a year – look at whether account closure is appropriate'
else:
return 'No action required – valid DBS check'
merged_df['Action'] = merged_df['Action'].apply(action_col)
它抛出“字符串索引必须是索引”错误并指向 expiry_date = row['EXPIRY_DATE'] 列,非常感谢任何帮助(希望问题比我的第一篇文章更有意义,哈哈)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\Users\orla_quidos\Code\DBS project\STATUS section of the scheduled job python script.ipynb Cell 7 line 5
54 else:
55 return 'No action required – valid DBS check'
---> 58 merged_df['Action'] = merged_df['Action'].apply(action_col)
File c:\Users\orla_quidos\anaconda3\lib\site-packages\pandas\core\series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
4661 def apply(
4662 self,
4663 func: AggFuncType,
(...)
4666 **kwargs,
4667 ) -> DataFrame | Series:
4668 """
4669 Invoke function on values of Series.
4670
(...)
4769 dtype: float64
4770 """
-> 4771 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File c:\Users\orla_quidos\anaconda3\lib\site-packages\pandas\core\apply.py:1123, in SeriesApply.apply(self)
1120 return self.apply_str()
1122 # self.f is Callable
...
---> 35 expiry_date = row['EXPIRY_DATE']
36 unique_id = row['UNIQUE_ID']
37 three_months_ahead = pd.to_datetime(three_month_ahead(), format='%m/%Y')
TypeError: string indices must be integers
您正在处理多个专栏。
merged_df['Action'] = merged_df['Action'].apply(action_col)
仅适用于Action
列。
试试这个:
merged_df['Action'] = merged_df.apply(action_col,axis=1)
并且无需创建空列:
merged_df['Action'] = '' # no need.