在 Pandas 中通过 GroupBy 操作创建新列时出现 KeyError

问题描述 投票:0回答:1

我试图通过对现有列执行一些操作来创建新列,但它在我的代码中引发了一个关键错误。我尝试使用 df.columns 来调试它并复制粘贴确切的名称,但仍然遇到相同的错误。我的代码如下:

def calculate_elasticity(group):
    sales_change = group['Primary Sales Quantity'].pct_change()
    price_change = group['MRP'].pct_change()
    
    elasticity = sales_change / price_change
    
    return elasticity

df['Variant-based Elasticity'] = df.groupby('Variant').transform(calculate_elasticity)

显示的错误是

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3801             try:
-> 3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:

16 frames
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'Primary Sales Quantity'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:
-> 3804                 raise KeyError(key) from err
   3805             except TypeError:
   3806                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'Primary Sales Quantity'

我尝试调试,以下是 df.columns 的结果

Index(['Cal. year / month', 'Material', 'Product Name', 'MRP', 'Distribution Channel  (Master)', 'Unnamed: 5', 'L1 Prod Category', 'L2 Prod Brand', 'L3 Prod Sub-Category', 'State', 'Primary Actual GSV  Value', 'Primary Sales Qty (CS)', 'Secondary GSV', 'Secondary sales Qty(CS)', 'Primary Volume(MT/KL)', 'Secondary Volume(MT/KL)', 'Variant', 'Weight', 'Offers', 'Primary Sales Quantity'], dtype='object')

print(df['Primary Sales Quantity'])
的结果是

0          155
1        16953
2          455
3          138
4         2653
         ...  
14147        6
14148        1
14149     8428
14150      237
14151       24
Name: Primary Sales Quantity, Length: 14152, dtype: int64

我尝试使用列名称进行调试。我什至可以通过该名称访问该列,只是在此函数中抛出错误。

python pandas dataframe csv
1个回答
2
投票

如果使用

GroupBy.transform
无法一起处理 2 列,则需要
GroupBy.apply
:

def calculate_elasticity(group):
    sales_change = group['Primary Sales Quantity'].pct_change()
    price_change = group['MRP'].pct_change()
    
    group['Variant-based Elasticity'] = sales_change / price_change
    return group

df = df.groupby('Variant', group_keys=False).apply(calculate_elasticity)
print (df)
  Variant  Primary Sales Quantity  MRP  Variant-based Elasticity
0       a                      10    8                       NaN
1       a                       7   10                 -1.200000
2       b                      87    3                       NaN
3       b                       8    2                  2.724138

或者更改没有辅助功能的解决方案:

g = df.groupby('Variant')
df['Variant-based Elasticity'] = (g['Primary Sales Quantity'].pct_change() /
                                  g['MRP'].pct_change())
print (df)
  Variant  Primary Sales Quantity  MRP  Variant-based Elasticity
0       a                      10    8                       NaN
1       a                       7   10                 -1.200000
2       b                      87    3                       NaN
3       b                       8    2                  2.724138

带有助手的替代解决方案

df1
DataFrame:

df1 = df.groupby('Variant')[['Primary Sales Quantity', 'MRP']].pct_change()
df['Variant-based Elasticity'] = df1['Primary Sales Quantity'] / df1['MRP']
print (df)
  Variant  Primary Sales Quantity  MRP  Variant-based Elasticity
0       a                      10    8                       NaN
1       a                       7   10                 -1.200000
2       b                      87    3                       NaN
3       b                       8    2                  2.724138

样本数据:

df = pd.DataFrame({'Variant': ['a', 'a', 'b', 'b'], 
                   'Primary Sales Quantity': [10, 7, 87, 8], 
                   'MRP': [8, 10, 3, 2]})
© www.soinside.com 2019 - 2024. All rights reserved.