根据列表按列中的值对Python数据帧进行排序

问题描述 投票:0回答:1

我有一个 pandas 数据框,我试图根据列中的值进行排序,但排序不是按字母顺序排列的。排序基于“排序器”列表(即给出值应排序的顺序的列表)。 但是,当我这样做时,我遇到了错误。 可执行代码如下:

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'JDate':["2022-01-31","2022-12-05","2023-11-10","2023-12-03","2024-01-16","2024-01-06","2011-01-04"],
        # 'Month':[1,12,11,12,1,1],
        'Code':[None,'John Johnson',np.nan,'John Smith','Mary Williams','ted bundy','George Lucas'],
        'Unit Price':[np.nan,200,None,56,75,65,60],
        'Quantity':[1500, 140000, 1400000, 455, 648, 759,1000],
        'Amount':[100, 10000, 100000, 5, 48, 59,449],
        'Invoice':['soccer','basketball','baseball','football','baseball','ice hockey','football'],
        'energy':[100.,100,100,54,98,3,45],
        'Category':['alpha','bravo','kappa','alpha','bravo','bravo','kappa']
})

df["JDate"] = pd.to_datetime(df["JDate"])

df["JYearMonth"] =  df['JDate'].dt.to_period('M')


index_to_use = ['Category','Code','Invoice','Unit Price']
values_to_use = ['Amount']
columns_to_use = ['JYearMonth']


df2 = df.pivot_table(index=index_to_use,
                            values=values_to_use,
                            columns=columns_to_use)

df4 = df2['Amount'].reset_index()

# setting up the sorter
sorter=['football','ice hockey','basketball','baseball']


#trying the categorical method
df4['Invoice'] = df['Invoice'].astype('Category').cat.set_categories(sorter)

df4.sort_values(['Invoice'],inplace=True)



df3 = df2.xs('alpha',level='Category')
df3 = df3.reset_index() #this prevents merging of rows


writer= pd.ExcelWriter(
        "t2test11.xlsx",
        engine='xlsxwriter'
    )


df.to_excel(writer,sheet_name="t2",index=True)
df2.to_excel(writer,sheet_name="t2test",index=True)
df4.to_excel(writer,sheet_name="t2testFixHeader",index=True)
df3.to_excel(writer,sheet_name="t2filter",index=True)

writer.close()

python pandas dataframe sorting categorical-data
1个回答
0
投票

几个问题:

  1. 您可以直接使用
    pd.Categorical
    代替 .astype 进行转换;
  2. 这将是一个自定义的 order,然后进行排序,因此使用
    ordered=True
    ;
  3. 您尝试使用
    df["Invoice"] instead of 
    df4["发票"]`

尝试:

df4["Invoice"] = pd.Categorical(df4["Invoice"], 
                                categories=sorter, 
                                ordered=True)

df4.sort_values(["Invoice"], inplace=True)

结果:

In [45]: df4["Invoice"].head()                              
Out[45]: 
0      football
4      football
3    ice hockey
1    basketball
2      baseball
Name: Invoice, dtype: category
Categories (4, object): ['football' < 'ice hockey' < 'basketball' < 'baseball']
© www.soinside.com 2019 - 2024. All rights reserved.