dict={"asset":["S3","S2","E4","E1","A6","A8"],
"Rank":[1,2,3,4,5,6],"number_of_attributes":[2,1,2,2,1,1],
"number_of_cards":[1,2,2,1,2," "],"cards_plus1":[2,3,3,2,3," "]}
dframe=pd.DataFrame(dict,index=[1,2,3,4,5,6],
columns=["asset","Rank","number_of_attributes","number_of_cards","cards_plus1"])
我想做"cards_plus1"
列的总和。我怎样才能做到这一点?cumsum列的输出应为:02581013
尝试一下:
首先,用nan替换空白值
import pandas as pd
import numpy as np
dict={"asset":["S3","S2","E4","E1","A6","A8"],"Rank":[1,2,3,4,5,6],"number_of_attributes":[2,1,2,2,1,1],
"number_of_cards":[1,2,2,1,2," "],"cards_plus1":[2,3,3,2,3," "]}
dframe=pd.DataFrame(dict,index=[1,2,3,4,5,6],
columns=["asset","Rank","number_of_attributes","number_of_cards","cards_plus1"])
## replace blank values by nan
print(dframe.replace(r'^\s*$', np.nan, regex=True, inplace=True))
print (dframe)
>>> asset Rank number_of_attributes number_of_cards cards_plus1
1 S3 1 2 1.0 2.0
2 S2 2 1 2.0 3.0
3 E4 3 2 2.0 3.0
4 E1 4 2 1.0 2.0
5 A6 5 1 2.0 3.0
6 A8 6 1 NaN NaN
现在cards_plus1列的数据类型是对象-更改为数字
### convert data type of the cards_plus1 to numeric
dframe['cards_plus1'] = pd.to_numeric(dframe['cards_plus1'])
现在计算累计和
### now we can calculate cumsum
dframe['cards_plus1_cumsum'] = dframe['cards_plus1'].cumsum()
print(dframe)
>>>
asset Rank number_of_attributes number_of_cards cards_plus1 \
1 S3 1 2 1.0 2.0
2 S2 2 1 2.0 3.0
3 E4 3 2 2.0 3.0
4 E1 4 2 1.0 2.0
5 A6 5 1 2.0 3.0
6 A8 6 1 NaN NaN
cards_plus1_cumsum
1 2.0
2 5.0
3 8.0
4 10.0
5 13.0
6 NaN
不是将空白值替换为nan,而是可以将它们替换为零,这取决于您想要的。希望有帮助。
由于列cards_plus1
的最后一个元素为字符串(" "
),因此您需要首先从中提取int类型的元素,然后可以使用np.cumsum对其求和。>
import numpy as np
a = [ x for x in dict['cards_plus1'] if type(x)==int ]
cumsum = np.cumsum(a)
我想以零而不是2开头。我希望发生中断:cards_plus1_cumsum 0 2 5 8 10 13