Pandas数据框基于分组依据随机洗消连续的值行

问题描述 投票:0回答:1

我也想对某个列进行分组,然后随机排列n个连续的行。

df = pd.DataFrame({'grouper_col':[1,1,1,1,1,1, 2,2,2,2,2,2], 'b':[1,2,3,4,5,6,21,22,23,24,25,26]})

    grouper_col   b
0             1   1
1             1   2
2             1   3
3             1   4
4             1   5
5             1   6
6             2  21
7             2  22
8             2  23
9             2  24
10            2  25
11            2  26

然后在每个组中随机播放例如两个连续的行,例如:

    grouper_col   b
0             1   5
1             1   6
2             1   3
3             1   4
4             1   1
5             1   2
6             2  21
7             2  22
8             2  25
9             2  26
10            2  23
11            2  24

其中每个组中的两个连续行与同一组中的其他两个连续行随机洗牌。

python pandas pandas-groupby permutation shuffle
1个回答
0
投票

这是解决此问题的一种方法:

# find the size of each group
sizes = df.groupby('grouper_col').b.size()
# iterate over the elements of the above series
for g, v in sizes.items():
    v -= 1
    # only randomly shuffle if there are more than 4
    if v > 4:
        random_s = np.array([0,0])
        while abs(random_s[0] - random_s[1]) <= 1:
            # if the indices are next to each other not valid
            random_s = np.random.randint(0, v, 2)
        # add 1 to the above indices (i.e [0,2] to [[0,1][2,3]])
        replace_ix = random_s[:,None] + np.array([0,1])
        # keep indices to replace and replace
        to_replace = df.loc[df.grouper_col.eq(g), 'b'].values
        repl_1 = to_replace[replace_ix[0]]
        repl_2 = to_replace[replace_ix[1]]
        to_replace[replace_ix[0]] = repl_2
        to_replace[replace_ix[1]] = repl_1
        df.loc[df.grouper_col.eq(g), 'b'] = to_replace        

print(df)

    grouper_col   b
0             1   5
1             1   6
2             1   3
3             1   4
4             1   1
5             1   2
6             2  21
7             2  25
8             2  26
9             2  24
10            2  22
11            2  23
© www.soinside.com 2019 - 2024. All rights reserved.