我有一个数据框A:
State Region Code
0 Texas Texas 1
1 Houston 0
2 Dallas 0
3 Austin 0
4 Michigan Michigan 1
5 Ann Arbor 0
6 Yipsilanti 0
7 Alaska Alaska 1
8 Troy 0
如果代码= 0,我想用上面的状态填写所有状态,希望得到如下输出:
State Region Code Group
0 Texas Texas 1 1
1 Texas Houston 0 1
2 Texas Dallas 0 1
3 Texas Austin 0 1
4 Michigan Michigan 1 2
5 Michigan Ann Arbor 0 2
6 Michigan Yipsilanti 0 2
7 Alaska Alaska 1 3
8 Alaska Troy 0 3
我已经尝试添加新列“ Group”以将上述数据分为3组,然后使用groupby填写State
import pandas as pd
import numpy as np
t1 = pd.Series({'State':'Texas', 'RegionalName':'Texas', 'Code':1})
t2 = pd.Series({'State':' ', 'RegionalName':'Houston','Code' :0})
df=pd.DataFrame([t1,t2])
df.columns=['State','Region','Code']
从txt.file读取以生成上述数据框:
df['Group'] = np.where(df['Code'] == 1, df['Code'+1, df['Code'])
然后它不起作用。有什么建议吗?谢谢。
您想要cumsum
使用Group
,ffill
使用State
:
df['Group'] = df['Code'].eq(1).cumsum()
df['State'] = df['State'].ffill()
输出:
State Region Code Group
0 Texas Texas 1 1
1 Texas Houston 0 1
2 Texas Dallas 0 1
3 Texas Austin 0 1
4 Michigan Michigan 1 2
5 Ann Arbor 0 2
6 Ann Yipsilanti 0 2
7 Alaska Alaska 1 3
8 Alaska Troy 0 3