我的代码使用了超过25GB的内存和崩溃

问题描述 投票:-2回答:1

因此,我使用Extra Trees Classifier来查找数据集中的要素重要性,它由13列和大约1000万行组成。我在上面放了一个椭圆形的信封,隔离林,一切都很好,它甚至还不到10 GB。我在jupyter笔记本上运行了代码,即使将其设置为low_memory = True,它也会给我带来内存错误。我尝试了拥有大约25GB内存的Google COlab,但仍然崩溃了,我现在非常困惑。

代码:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline


from sklearn.ensemble import ExtraTreesClassifier 


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)



# Loading First Dataframe

link = '...'

fluff, id = link.split('=')
print (id) # Verify that you have everything after '='
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('Final After Simple Filtering.csv')  
df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)
#df = df.astype(float)


ExtraT = ExtraTreesClassifier(n_estimators = 100,bootstrap=False,n_jobs=1) 

y=df['Power_kW']

del df['Power_kW']

X=df


ExtraT.fit(X,y)

feature_importance = ExtraT.feature_importances_ 

feature_importance_normalized = np.std([tree.feature_importances_ for tree in ExtraT.estimators_], axis = 1)

plt.bar(X.columns, feature_importance) 
plt.xlabel('Lable') 
plt.ylabel('Feature Importance') 
plt.title('Parameters Importance') 
plt.show()  

谢谢

python pandas dataframe machine-learning jupyter-notebook
1个回答
0
投票

我之前遇到过相同的错误,并且已经解决了。

更改运行时类型GPU比CPU快得多,因此它将有所帮助。但是该怎么做呢?请按照以下步骤操作:

enter image description here

enter image description here

enter image description here

请确保您使用25GB而不是12GB的RAM。不要忘记Colab是免费和限量版。如果仍然有问题,请告诉我,我会尽快帮助您。

© www.soinside.com 2019 - 2024. All rights reserved.