假设我有一个三列的数据框:年龄,性别和国家。
我想根据性别随机地整理数据但以有序方式。有n个雄性和m个雌性,其中n可以小于,大于或等于m。改组应该以这样的方式进行:对于8个人,我们将获得以下结果:
男,女,男,女,男,女,女,女....(如果还有更多女:m> n)男性,女性,男性,女性,男性,男性,男性,男性(如果有更多男性:n> m)男性,女性,男性,女性,男性,女性,男性,女性,男性,女性(如果男女平等:n = m)
df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"],
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})
首先在每个组中添加序列号:
df['Order'] = df.groupby('Gender').cumcount()
然后排序:
df.sort_values('Order')
它给您:
Age Gender Country Order
0 10 Male US 0
3 40 Female Canada 0
1 20 Male UK 1
4 50 Female US 1
2 30 Male China 2
6 70 Female China 2
5 60 Male UK 3
7 80 Female Brazil 3
[如果您想随机播放,请在开始时进行操作,例如df = df.sample(frac=1)
,请参阅:Shuffle DataFrame rows
df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"],
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})
df['Sort_Column'] = 0
df_male = df.loc[df['Gender'] == 'Male'].reset_index(drop=True)
df_male['Sort_Column'] = df_male['Sort_Column'] + df_male.index*2
df_female = df1.loc[df1['Gender'] == 'Female'].reset_index(drop=True)
df_female['Sort_Column'] = df_female['Sort_Column'] + df_female.index*2 + 1
df_sorted=pd.concat([df_male, df_female]).sort_values('Sort_Column').drop('Sort_Column', axis=1).reset_index(drop=True)
df_sorted
输出:
Age Gender Country
0 10 Male US
1 40 Female Canada
2 20 Male UK
3 50 Female US
4 30 Male China
5 70 Female China
6 60 Male UK
7 80 Female Brazil