问题识别原因:ValueError:无限或过大后归一化

问题描述 投票:0回答:1

我有一个数据集,我首先规范化,删除na,现在,我尝试df [col] = preprocessing.scale(df [col] .values),这里我得到错误:ValueError:输入包含无穷大或值对于dtype来说太大了('float64')。

以下是我所做的步骤:

1-确保数据表(pandas)没有NAN通过删除nan 2-使用pct_change规范化值 - 在调用pct_change后向右删除na

然后尝试缩放功能并获取错误

这是代码片段:

来自主要电话:

dataset = f"./Data/Original/{RATIO_TO_PREDICT}.csv"
df = pd.read_csv(dataset)
df.set_index("Timestamp", inplace = True)

#calculate volume candle type 1

#calculate volume candle type 2

#df['VC1_Future'] = df["VC1"].shift(-FUTURE_PERIOD_PREDICT)
#df['VC1_Target'] = list(map(classify,df["VC1"], df["VC1_Future"]))

#df['VC2_Future'] = df["VC2"].shift(-FUTURE_PERIOD_PREDICT)
#df['VC2_Target'] = list(map(classify,df["VC2"], df["VC2_Future"]))

df.fillna(method="ffill", inplace = True)
df.dropna(inplace=True)

df['Price_Future'] = df["Close"].shift(-FUTURE_PERIOD_PREDICT) # We go N number of time to the future, get that value and put it in this row's FUTURE PRICE value
df['Price_Target'] = list(map(classify,df["Close"], df["Price_Future"])) 
# Now we compare the current price with that future price to see if we went up, down or none, here we use the 0.015 or 1.5% spread to make sure we pass commision

# Now we want to separate part of the data for training and another part for testing
times = sorted(df.index.values)
last_5pct = times[-int(0.1 * len(times))]


# We get the final columns we want, making sure we are not including any of the High, Low, and Open values. Remember that Price Target is last. That is OUR GOAL !!!
#dfs = df[["Close", "Volume", "Price_Future", "Price_Target"]]#, "VC1", "VC2", "VC1_Future", "VC2_Future", "VC1_Target", "VC2_Target", "Price_Future", "Price_Target"]]


# We finally separate the data into two different lists
validation_df = df[(df.index >= last_5pct)]
training_df = df[(df.index < last_5pct)]

# We save each list into a file so that we don't need to make this process walk through again unless A) we get new data B) we loose previous data on hard drive
Message(name)
print(len(df), len(training_df), len(validation_df))
Message(len(df))
#training_df.dropna(inplace=True)
print(np.isfinite(training_df).all())

print('')

#validation_df.dropna(inplace=True)
print(np.isfinite(validation_df).all())


Train_X, Train_Y = preprocess(training_df)

现在,谈到这个功能,这是一个开始:

def preprocess(df) :
    df.drop('Price_Future', 1)
    #df.drop('VC1_Future', 1)
    #df.drop('VC2_Future', 1)
    for col in df.columns:
        if col != "Price_Target" and col != "VC1_Target" and col != "VC2_Target":
            df[col] = df[col].pct_change() # gets the percent change, other than the volume, the data now should sit between -1 and 1, the formula : (value[i] / value[i-1]) - 1
            df.dropna(inplace=True)
            df[col] = preprocessing.scale(df[col].values)

当我打电话给主要人员时,正如你可能注意到的那样,我正在检查nan,结果如下:

Open             True
High             True
Low              True
Close            True
Volume           True
Price_Future    False
Price_Target     True
dtype: bool

并且在函数开始时我正在删除Price_Future列,因此,为什么我在缩放行中出现此错误?

此外,上面的代码会导致很多警告:

尝试在DataFrame的切片副本上设置值。尝试使用.loc [row_indexer,col_indexer] = value

但我是python和所有这些东西的新手,所以我不知道如何修复函数的代码。

有人请帮忙。

谢谢

python scikit-learn scale infinite
1个回答
0
投票

OUCH,找到了主要问题;

df [col] = preprocessing.scale(df [col] .values)

是错的

df [col] = preprocessing.scale(df [col])

注意缩放调用中缺少.values !!!

但请有人帮我提醒那些警告信息。

© www.soinside.com 2019 - 2024. All rights reserved.