我有以下df
County TotPerson
Wayne 148
Oakland 125
Macomb 63
Washtenaw 30
Ingham 30
Monroe 28
Hillsdale 15
Livingstone 15
Jackson 14
Lenawee 12
我想存储在不同的列表或字典中(这没关系)(从上到下的总和不超过190个的县。
结果应该看起来像这样:
Group1
[Wayne]
Group2
[Oakland,Macomb]
Group3
[Washtenaw, Ingham, Monroe, Hillsdale, Livingstone, Jackson, Lenawee]
groups = []
for i in range(len(df)):
if len(df)>0:
groups.append(df.loc[df.TotPerson.cumsum().lt(190)].County.tolist())
df = df.loc[df.TotPerson.cumsum().ge(190)]
[['Wayne'],
['Oakland', 'Macomb'],
['Washtenaw',
'Ingham',
'Monroe',
'Hillsdale',
'Livingstone',
'Jackson',
'Lenawee']]
逻辑有点像达到极限190时复位
sumlm = np.frompyfunc(lambda a,b: a+b if a+b < 190 else b,2,1)
id=sumlm.accumulate(df.TotPerson, dtype=np.object).eq(df.TotPerson).cumsum()
l=df.County.groupby(id).agg(list)
TotPerson
1 [Wayne]
2 [Oakland, Macomb]
3 [Washtenaw, Ingham, Monroe, Hillsdale, Livings...
Name: County, dtype: object
l.tolist()
或尝试for循环
l=[]
c=0
for i, y in enumerate(df.TotPerson):
c += y
if c >= 190:
l.append(i)
c = 0
df.County.groupby(df.index.isin(l).cumsum()).agg(list)
我只能使用循环来解决它,但是numpy.cumsum在这个问题上没有太大帮助。希望它能解决您的问题。df = pd.read_clipboard()
cumsum=0
lst1=[]
lst2=[]
for j,i in zip(df.County,df.TotPerson):
cumsum+=i
if cumsum <=190:
lst1.append(j)
else:
lst2.append(lst1)
cumsum=i
lst1=[j]
lst2.append(lst1)
lst2 # This is the desired list