从现有数据集生成动态数据帧

问题描述 投票:0回答:1

我的目标是创建一个具有唯一名称作为 Col A 的 DataFrame 和来自类型的原始 DataFrame 的 sumif,以及新 DataFrame 的当前 iloc。

我的输出应该是什么的示例

我试图远离循环,但不断收到错误(代码下方的错误)

def exercise_custom(df):
    cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
    cust_df['Payment_Count'] = len(df[(df['type']=='PAYMENT') & (df['nameDest']==cust_df.DestName)])
    display(cust_df.head(5))

return cust_df
pass     

回溯给了我

ValueError                                Traceback (most recent call last)
Cell In[72], line 56
     52     return "TODO"
     54     pass
---> 56 visual_custom(df)

Cell In[72], line 40, in visual_custom(df)
     38 def visual_custom(df):
---> 40     exercise_custom(df)
     43     fig, axs = plt.subplots(1, figsize=(6,10))
     44     #updated to show bottom labels correctly & chg color

Cell In[72], line 6, in exercise_custom(df)
      1 def exercise_custom(df):
      2     #X
      3     #for index, row in df:
      4         #cust_df = df['nameDest'].unique()
      5     cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
----> 6     cust_df['Payment_Count'] = len(df[(df['type']=='PAYMENT') & (df['nameDest']==cust_df.DestName)])
      7     display(cust_df.head(5))
     13         #X= pd.DataFrame(df['nameDest'].unique())
     14         #df[df_payment_count] = len(df[(df['type']=='PAYMENT') & (df['isFraud']==1)])
     15     
   (...)
     32              #              'Cash In':len(df[(df['type']=='CASH_IN') & (df['isFraud']==1)])}
     33             #    }

File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/ops/common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/arraylike.py:40, in OpsMixin.__eq__(self, other)
     38 @unpack_zerodim_and_defer("__eq__")
     39 def __eq__(self, other):
---> 40     return self._cmp_method(other, operator.eq)

File ~/anaconda3/envs/forageenv/lib/python3.9/site-packages/pandas/core/series.py:6105, in Series._cmp_method(self, other, op)
   6102 res_name = ops.get_op_result_name(self, other)
   6104 if isinstance(other, Series) and not self._indexed_same(other):
-> 6105     raise ValueError("Can only compare identically-labeled Series objects")
   6107 lvalues = self._values
   6108 rvalues = extract_array(other, extract_numpy=True, extract_range=True)

ValueError: Can only compare identically-labeled Series objects

我考虑解决这个问题的一种方法是循环,但它太慢了,这需要更快。 (这是我最后的选择,因为我所能找到的只是“循环很糟糕”)。

def exercise_custom(df):
    payment_counts=[]
    cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])
    for name in cust_df['DestName']:
        filtered_df = df[(df['type'] == 'PAYMENT') & (df['nameDest'] == name)].dropna().reset_index(drop=True)
        if len(filtered_df) == 0:
            payment_count =0
        else:
            payment_count=len(filtered_df)
        payment_count=len(filtered_df)
        payment_counts.append(payment_count)
    cust_df['Payment_Count'] = payment_counts
    display(cust_df.head(5))
    
    
    pass
python pandas
1个回答
0
投票

这就是导致您报告错误的原因:

首先将

cust_df
定义为
df

的子集
cust_df = pd.DataFrame(df.nameDest.dropna().unique(), columns=['DestName'])

然后您需要对

cust_df
中的列和
df

中的列进行逐个成员比较
df['nameDest']==cust_df.DestName

在上面的行中,两个系列必须具有相同的长度,但它们的长度不同,因为

cust_df
的构造比
df
短。

如需进一步帮助,请以文本形式共享数据并使用一致的变量名称。您的图像显示“旧”/“新”df,但您的代码没有引用它。

© www.soinside.com 2019 - 2024. All rights reserved.