在pandas中,我有两个系列的x行,我想添加一列,在这列中,我得到col1中的值从第一行到x-1的滚动次数。
df是这样的。
col1 col2
0 B A
1 B C
2 A B
3 A B
4 A C
5 B A
所需的输出是
col1 col2 freq
0 B A 0
1 B C 1
2 A B 1
3 A B 2
4 A C 3 #A appears 3 times in the two columns from row 0 to 3
5 B A 4 #B appears 4 times in the two columns from row 0 to 4
先谢谢一个初学者,G
让我们使用一些数据框重塑,groupby和cumcount。
dfs = df.stack()
df['freq'] = dfs.groupby(dfs).cumcount().unstack()['col1']
print(df)
输出:
col1 col2 freq
0 B A 0
1 B C 1
2 A B 1
3 A B 2
4 A C 3
5 B A 4
无论df中的列数是多少,这都能解决。
import pandas as pd
import numpy as np
def add(d1,d2):
# adding two dictionary
for i in d2.keys():
if i in d1.keys():
d1[i] = d1[i] +d2[i]
else:
d1[i] = d2[i]
return d1
if __name__ == '__main__':
counts = {}
df = pd.DataFrame({"a":[1, 2, 3, 1, 2], "b":[2, 1, 2, 3, 1]})
col = list(df)
for ind, it in df.iterrows():
unique,count = np.unique(it,return_counts=True)
unique_dict = dict(zip(unique, count))
counts = add(counts,unique_dict)
df.loc[ind, "freq"] = counts[it[col[0]]]
df["freq"] =df["freq"]-1
from collections import defaultdict
def fn():
d1, d2 = defaultdict(int), defaultdict(int)
x = yield
while True:
x = yield d1[x.col1] + d2[x.col1]
d1[x.col1] += 1
d2[x.col2] += 1
f = fn()
next(f)
df['freq'] = df[['col1', 'col2']].apply(lambda x: f.send(x), axis=1)
print(df)
打印。
col1 col2 freq
0 B A 0
1 B C 1
2 A B 1
3 A B 2
4 A C 3
5 B A 4
EDIT(任意列数的解决方案)。
from collections import defaultdict
def fn(cols):
dd = [defaultdict(int) for _ in cols]
x = yield
while True:
x = yield sum(d[x[0]] for d in dd)
for i, d in enumerate(dd):
d[x[i]] += 1
cols = ['col1', 'col2']
f = fn(cols)
next(f)
df['freq'] = df[cols].apply(lambda x: f.send(x), axis=1)
print(df)