我的目标是创建一个具有唯一名称作为 Col A 的 DataFrame 和来自类型的原始 DataFrame 的 sumif,以及新 DataFrame 的当前 iloc。
我试图远离循环,但不断收到错误(代码下方的错误)
def exercise_custom(df):
cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
cust_df['Payment_Count'] = len(df[(df['type']=='PAYMENT') & (df['nameDest']==cust_df.DestName)])
display(cust_df.head(5))
return cust_df
pass
回溯给了我
ValueError Traceback (most recent call last)
Cell In[72], line 56
52 return "TODO"
54 pass
---> 56 visual_custom(df)
Cell In[72], line 40, in visual_custom(df)
38 def visual_custom(df):
---> 40 exercise_custom(df)
43 fig, axs = plt.subplots(1, figsize=(6,10))
44 #updated to show bottom labels correctly & chg color
Cell In[72], line 6, in exercise_custom(df)
1 def exercise_custom(df):
2 #X
3 #for index, row in df:
4 #cust_df = df['nameDest'].unique()
5 cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
----> 6 cust_df['Payment_Count'] = len(df[(df['type']=='PAYMENT') & (df['nameDest']==cust_df.DestName)])
7 display(cust_df.head(5))
13 #X= pd.DataFrame(df['nameDest'].unique())
14 #df[df_payment_count] = len(df[(df['type']=='PAYMENT') & (df['isFraud']==1)])
15
(...)
32 # 'Cash In':len(df[(df['type']=='CASH_IN') & (df['isFraud']==1)])}
33 # }
File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/ops/common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
72 return NotImplemented
74 other = item_from_zerodim(other)
---> 76 return method(self, other)
File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/arraylike.py:40, in OpsMixin.__eq__(self, other)
38 @unpack_zerodim_and_defer("__eq__")
39 def __eq__(self, other):
---> 40 return self._cmp_method(other, operator.eq)
File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/series.py:6105, in Series._cmp_method(self, other, op)
6102 res_name = ops.get_op_result_name(self, other)
6104 if isinstance(other, Series) and not self._indexed_same(other):
-> 6105 raise ValueError("Can only compare identically-labeled Series objects")
6107 lvalues = self._values
6108 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
ValueError: Can only compare identically-labeled Series objects
我考虑解决这个问题的一种方法是循环,但它太慢了,这需要更快。 (这是我最后的选择,因为我所能找到的只是“循环很糟糕”)。
def exercise_custom(df):
payment_counts=[]
cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
for name in cust_df['DestName']:
filtered_df = df[(df['type'] == 'PAYMENT') & (df['nameDest'] == name)].dropna().reset_index(drop=True)
if len(filtered_df) == 0:
payment_count =0
else:
payment_count=len(filtered_df)
payment_count=len(filtered_df)
payment_counts.append(payment_count)
cust_df['Payment_Count'] = payment_counts
display(cust_df.head(5))
pass
这就是导致您报告错误的原因:
首先将
cust_df
定义为 df
的子集
cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
然后您需要对
cust_df
中的列和 df
中的列进行逐个成员比较
df['nameDest']==cust_df.DestName
在上面的行中,两个系列必须具有相同的长度,但它们的长度不同,因为
cust_df
的构造比 df
短。
如需进一步帮助,请以文本形式共享数据并使用一致的变量名称。您的图像显示“旧”/“新”df,但您的代码没有引用它。