拟合 scikit-learn 决策树和随机森林分类器时出现内存错误

问题描述 投票:0回答:2

我有一个 pandas DataFrame,有 86k 行、5 个特征和 1 个目标列。我正在尝试使用 70% 的 DataFrame 作为训练数据来训练 DecisionTreeClassifier,并且从 fit 方法中得到 MemoryError。我尝试更改一些参数,但我真的不知道是什么导致了错误,所以我不知道如何处理它。 我使用的是 Windows 10,内存为 8GB。

代码

train, test = train_test_split(data, test_size = 0.3)
X_train = train.iloc[:, 1:-1] # first column is not a feature
y_train = train.iloc[:, -1]
X_test = test.iloc[:, 1:-1]
y_test = test.iloc[:, -1]

DT = DecisionTreeClassifier()
DT.fit(X_train, y_train)
dt_predictions = DT.predict(X_test)

错误

File (...), line 97, in <module>
DT.fit(X_train, y_train)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 790, in fit
X_idx_sorted=X_idx_sorted)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 362, in fit
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
File "sklearn\trewe\_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
File "sklearn\tree\_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
File "sklearn\tree\_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 671612928 bytes

当我尝试 RandomForestClassifier 时,会发生同样的错误,总是在进行拟合的行中。我该如何解决这个问题?

python machine-learning scikit-learn decision-tree
2个回答
2
投票

我也遇到了同样的问题。确保您正在处理分类问题而不是回归问题。如果您的目标列是连续的,您可能需要使用 http://scikit-learn.org/stable/modules/ generated/sklearn.ensemble.RandomForestRegressor.html 而不是 RandomForestClassifier。


0
投票

您好,请问有什么解决办法吗?我的C盘有31G剩余空间,提示只能存储20G数据。

Unable to allocate 20.4 GiB for an array with shape (72123, 75923) and data type float32

© www.soinside.com 2019 - 2024. All rights reserved.