我具有如下数据框:
import pandas as pd
df = pd.DataFrame({
'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales' : [10,20,30,40,100,10,30,50,60,100]
})
df.head(15)
当前方法:根据df中的单个类别进行手动归一化
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_fruits = df[df['category'] == "fruits"]
df_fruits['sales'] = scaler.fit_transform(df_fruits[['sales']])
df_fruits.head()
df_fruits = pd.to_csv('minmax/output/category-{}-minmax.csv'.format('XX'))
问题:-如何遍历df中的所有类别-然后如何导出具有相应类别名称的csv文件]
非常感谢
使用Series.unique
:
for i in df["category"].unique():
cat = df[df['category'] == i]
cat['sales'] = scaler.fit_transform(cat[['sales']])
cat.to_csv('minmax/output/category-{}-minmax.csv'.format(i))
看起来您必须执行一些功能体操才能起作用。
您的dataframe
。
import pandas as pd
df = pd.DataFrame({
'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales' : [10,20,30,40,100,10,30,50,60,100]
})
def minmax_wrapper(x):
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
return pd.Series(scaler.fit_transform(x.values.reshape(-1,1)).flatten())
现在将其应用于分组的数据框。
df['scaled_sales'] = df.groupby('category')['sales'].transform(minmax_wrapper)
Voila!
您可以使用以下方法遍历您的组
# I believe this should work haven't tried it out
for category, grouped in df.groupby('category'):
grouped.to_csv(f"minmax/output/category-{category}-minmax.csv")