如何根据不同的数据帧应用minmax缩放器

问题描述 投票:0回答:2

我具有如下数据框:

import pandas as pd

df = pd.DataFrame({

'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales'   : [10,20,30,40,100,10,30,50,60,100]

})

df.head(15)

当前方法:根据df中的单个类别进行手动归一化

from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

df_fruits = df[df['category'] == "fruits"]
df_fruits['sales'] = scaler.fit_transform(df_fruits[['sales']])
df_fruits.head()
df_fruits = pd.to_csv('minmax/output/category-{}-minmax.csv'.format('XX'))

问题:-如何遍历df中的所有类别-然后如何导出具有相应类别名称的csv文件]

非常感谢

python
2个回答
1
投票

使用Series.unique

for i in df["category"].unique():
    cat = df[df['category'] == i]
    cat['sales'] = scaler.fit_transform(cat[['sales']])
    cat.to_csv('minmax/output/category-{}-minmax.csv'.format(i))

0
投票

看起来您必须执行一些功能体操才能起作用。

您的dataframe

import pandas as pd

df = pd.DataFrame({

'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales'   : [10,20,30,40,100,10,30,50,60,100]

})
def minmax_wrapper(x):
    from sklearn.preprocessing import MinMaxScaler
    scaler = MinMaxScaler()
    return pd.Series(scaler.fit_transform(x.values.reshape(-1,1)).flatten())

现在将其应用于分组的数据框。

df['scaled_sales'] = df.groupby('category')['sales'].transform(minmax_wrapper)

Voila!

您可以使用以下方法遍历您的组

# I believe this should work haven't tried it out
for category, grouped in df.groupby('category'):
    grouped.to_csv(f"minmax/output/category-{category}-minmax.csv")
© www.soinside.com 2019 - 2024. All rights reserved.