从两个柱子上滚动出现的情况。

问题描述 投票:0回答:1

在pandas中,我有两个系列的x行,我想添加一列,在这列中,我得到col1中的值从第一行到x-1的滚动次数。

df是这样的。

   col1 col2
0  B    A
1  B    C
2  A    B
3  A    B
4  A    C
5  B    A

所需的输出是

   col1 col2 freq
0  B    A    0
1  B    C    1
2  A    B    1
3  A    B    2
4  A    C    3    #A appears 3 times in the two columns from row 0 to 3
5  B    A    4    #B appears 4 times in the two columns from row 0 to 4

先谢谢一个初学者,G

python pandas countif rolling-computation
1个回答
1
投票

让我们使用一些数据框重塑,groupby和cumcount。

dfs = df.stack()
df['freq'] = dfs.groupby(dfs).cumcount().unstack()['col1']
print(df)

输出:

  col1 col2  freq
0    B    A     0
1    B    C     1
2    A    B     1
3    A    B     2
4    A    C     3
5    B    A     4

0
投票

无论df中的列数是多少,这都能解决。

import pandas as pd
import numpy as np

def add(d1,d2):
    # adding two dictionary
    for i in d2.keys():
        if i in d1.keys():
            d1[i] = d1[i] +d2[i]
        else:
            d1[i] = d2[i]
    return d1

if __name__ == '__main__':
    counts = {}
    df = pd.DataFrame({"a":[1, 2, 3, 1, 2], "b":[2, 1, 2, 3, 1]})
    col = list(df)
    for ind, it in df.iterrows():
        unique,count = np.unique(it,return_counts=True)
        unique_dict = dict(zip(unique, count))
        counts = add(counts,unique_dict)

        df.loc[ind, "freq"] = counts[it[col[0]]]
    df["freq"] =df["freq"]-1

0
投票
from collections import defaultdict

def fn():
    d1, d2 = defaultdict(int), defaultdict(int)
    x = yield
    while True:
        x = yield d1[x.col1] + d2[x.col1]
        d1[x.col1] += 1
        d2[x.col2] += 1

f = fn()
next(f)
df['freq'] = df[['col1', 'col2']].apply(lambda x: f.send(x), axis=1)

print(df)

打印。

  col1 col2  freq
0    B    A     0
1    B    C     1
2    A    B     1
3    A    B     2
4    A    C     3
5    B    A     4

EDIT(任意列数的解决方案)。

from collections import defaultdict

def fn(cols):
    dd = [defaultdict(int) for _ in cols]
    x = yield
    while True:
        x = yield sum(d[x[0]] for d in dd)
        for i, d in enumerate(dd):
            d[x[i]] += 1

cols = ['col1', 'col2']
f = fn(cols)
next(f)
df['freq'] = df[cols].apply(lambda x: f.send(x), axis=1)

print(df)
© www.soinside.com 2019 - 2024. All rights reserved.