列中的熊猫科学格式值在应用minmaxscaler后给出错误的输出

问题描述 投票:0回答:1

[我有一个如下所示的数据框。我在Price列上应用了MinMaxScaler,并使用了inverse_transform函数来获取原始价格值,但结果有误。请在此方面建议我。

DataFrame:

Date    Customer    Price
1/6/2019    A   142404534.13
1/7/2019    A   989.34
1/8/2019    A   45444.57
1/9/2019    A   574343.10
1/10/2019   A   23232.34
1/1/2019    A   923423.00
1/2/2019    A   332.00
1/3/2019    A   2342323.24
1/4/2019    A   232.00
1/5/2019    A   65.70
1/6/2019    B   875.46
1/7/2019    B   142466027340.03
1/8/2019    B   25.17
1/9/2019    B   1.01
1/10/2019   B   1.00
1/10/2019   B   57.61
1/6/2019    B   232232.78
1/7/2019    B   15.20
1/8/2019    B   44.56
1/9/2019    B   2323254.45
1/10/2019   B   395.45
1/10/2019   B   23423454.92
1/6/2019    C   34.12
1/7/2019    C   89.34
1/8/2019    C   44.57
1/9/2019    C   343.10
1/10/2019   C   232.34

df上的MinMaxScaler代码:

 from sklearn.preprocessing import MinMaxScaler
    df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x))
    scaler=MinMaxScaler()
    dff = df.groupby('Customer').Price.transform(lambda s:scaler.fit_transform(s.values.reshape(-1,1)).ravel())
dff = pd.DataFrame(dff)
dff['Price'] = dff['Price'].apply(lambda x: '{:.2f}'.format(x))

dff = pd.concat([dff['Price'] , df['Customer']] , axis=1)

dff输出:

Price   Customer
0   1.00    A
1   0.00    A
2   0.00    A
3   0.00    A
4   0.00    A
5   0.01    A
6   0.00    A
7   0.02    A
8   0.00    A
9   0.00    A
10  0.00    B
11  1.00    B
12  0.00    B
.
.
.
.

20  0.00    B
21  0.00    B
22  0.00    C
23  0.18    C
24  0.03    C
25  1.00    C
26  0.64    C

inverse_transform函数代码以获取实际价格值:

dd = dff.groupby('Customer').Price.transform(lambda s: scaler.inverse_transform(s.values.reshape(-1,1)).ravel())

dd = pd.DataFrame(dd)
dd['Price'] = dd['Price'].apply(lambda x: '{:.2f}'.format(x))
dd = pd.concat([dd['Price'] , df['Customer']] , axis=1)

dd输出:

Price   Customer
0   343.10  A
1   34.12   A
2   34.12   A
3   34.12   A
4   34.12   A
5   37.21   A
6   34.12   A
7   40.30   A
8   34.12   A
9   34.12   A
10  34.12   B
11  343.10  B
12  34.12   B
13  34.12   B
.
.
.
.
.
18  34.12   B
19  34.12   B
20  34.12   B
21  34.12   B
22  34.12   C
23  89.74   C
24  43.39   C
25  343.10  C
26  231.87  C

请对此提供帮助并建议我。

python pandas data-science normalization scientific-notation
1个回答
0
投票

这里您在脚本中遇到多个问题:

  1. 使用df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x)),您将价格列转换为字符串,因此任何数字运算对该dtype均无效。如果要设置显示的浮点精度,可以在导入熊猫行后使用pd.set_option('display.float_format', lambda x: '%.2f' % x)
  2. 如果我理解得很好,您希望按比例调整每客户范围0-1值。如果是这种情况,您将无法像这样一行:

    df.groupby('Customer')。Price.transform(lambdax:scaler.fit_transform(s.values.reshape(-1,1))。ravel())

    因为每组缩放器对象的每次迭代都会覆盖前一组的最小/最大参数。因此,最后,调用scaler.inverse_transform()时,您具有最后一组的参数。

解决方案:

import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler

scalers_dict = dict({}) # Prepare place to hold scalers per Customer

df = pd.read_csv('stack_data.csv', parse_dates=['Date'])

df['Price_scaled'] = None
df['Price_inverse_scaled'] = None

# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
    scalers_dict[customer] = MinMaxScaler()
    scaled_price = scalers_dict[customer].fit_transform(df[df['Customer']==customer].Price.values.reshape(-1, 1))
    df.loc[df['Customer']==customer, 'Price_scaled'] = scaled_price

# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
    inverse_scale = scalers_dict[customer].inverse_transform(df[df['Customer']==customer].Price_scaled.values.reshape(-1, 1))
    df.loc[df['Customer']==customer, 'Price_inverse_scaled'] = inverse_scale

输出(我将精度设置为.7f,因为最大值对于许多值来说太大,并且大多数都以双精度为0.00):

        Date Customer                Price Price_scaled Price_inverse_scaled
0  2019-01-06        A    142404534.1300000    1.0000000    142404534.1300000
1  2019-01-07        A          989.3400000    0.0000065          989.3400000
2  2019-01-08        A        45444.5700000    0.0003187        45444.5700000
3  2019-01-09        A       574343.1000000    0.0040327       574343.1000000
4  2019-01-10        A        23232.3400000    0.0001627        23232.3400000
5  2019-01-01        A       923423.0000000    0.0064840       923423.0000000
6  2019-01-02        A          332.0000000    0.0000019          332.0000000
7  2019-01-03        A      2342323.2400000    0.0164479      2342323.2400000
8  2019-01-04        A          232.0000000    0.0000012          232.0000000
9  2019-01-05        A           65.7000000    0.0000000           65.7000000
10 2019-01-06        B          875.4600000    0.0000000          875.4600000
11 2019-01-07        B 142466027340.0299988    1.0000000 142466027340.0299988
12 2019-01-08        B           25.1700000    0.0000000           25.1700000
13 2019-01-09        B            1.0100000    0.0000000            1.0100000
14 2019-01-10        B            1.0000000    0.0000000            1.0000000
15 2019-01-10        B           57.6100000    0.0000000           57.6100000
16 2019-01-06        B       232232.7800000    0.0000016       232232.7800000
17 2019-01-07        B           15.2000000    0.0000000           15.2000000
18 2019-01-08        B           44.5600000    0.0000000           44.5600000
19 2019-01-09        B      2323254.4500000    0.0000163      2323254.4500000
20 2019-01-10        B          395.4500000    0.0000000          395.4500000
21 2019-01-10        B     23423454.9200000    0.0001644     23423454.9200000
22 2019-01-06        C           34.1200000    0.0000000           34.1200000
23 2019-01-07        C           89.3400000    0.1787171           89.3400000
24 2019-01-08        C           44.5700000    0.0338210           44.5700000
25 2019-01-09        C          343.1000000    1.0000000          343.1000000
26 2019-01-10        C          232.3400000    0.6415302          232.3400000
© www.soinside.com 2019 - 2024. All rights reserved.