[我试图按月找到沃尔玛和Food Lion的均值,但下面将该组中的HEB收入数据包括在内。
df = pd.DataFrame({'date': ['1960-01-01','1960-01-01','1960-01-01','1960-02-01','1960-02-01','1960-02-01',
'1961-01-01','1961-01-01','1961-01-01','1961-02-01','1961-02-01','1961-02-01'],
'Company': ['HEB', 'Walmart', 'Food Lion','HEB', 'Walmart', 'Food Lion',
'HEB', 'Walmart', 'Food Lion','HEB', 'Walmart', 'Food Lion'],
'Revenue': [200, 800, 400, 400, 300, 600, 400, 400, 900, 900, 800, 600]})
print(df)
输出:
date Company Revenue
0 1960-01-01 HEB 200
1 1960-01-01 Walmart 800
2 1960-01-01 Food Lion 400
3 1960-02-01 HEB 400
4 1960-02-01 Walmart 300
5 1960-02-01 Food Lion 600
6 1961-01-01 HEB 400
7 1961-01-01 Walmart 400
8 1961-01-01 Food Lion 900
9 1961-02-01 HEB 900
10 1961-02-01 Walmart 800
11 1961-02-01 Food Lion 600
我正在尝试不在此groupby
中包含HEB数据。我该怎么做?
df.groupby('date')['Revenue'].mean()
date
1960-01-01 466.666667
1960-02-01 433.333333
1961-01-01 566.666667
1961-02-01 766.666667
Name: Value, dtype: float64
有几种方法可以做到这一点。也许最简单(但可能不是最有效)是从要分组的数据中简单排除“ HEB”:
df[df.Company != "HEB"].groupby("date")["Revenue"].mean()
要获得单个公司,您可以使用
df = df[df['Company'] == 'Walmart']
print(df)
date Company Revenue
1 1960-01-01 Walmart 800
4 1960-02-01 Walmart 300
7 1961-01-01 Walmart 400
10 1961-02-01 Walmart 800
如果要排除公司,则可以使用
df = df[df['Company'] != 'HEB']
print(df)
date Company Revenue
1 1960-01-01 Walmart 800
2 1960-01-01 Food Lion 400
4 1960-02-01 Walmart 300
5 1960-02-01 Food Lion 600
7 1961-01-01 Walmart 400
8 1961-01-01 Food Lion 900
10 1961-02-01 Walmart 800
11 1961-02-01 Food Lion 600