工作管道上的GridSearchCV返回ValueError

问题描述 投票:0回答:1

我正在使用GridSearchCV来查找管道的最佳参数。

我的管道似乎运作良好,因为我可以申请:

pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)

而且我得到了不错的结果。

但GridSearchCV显然不喜欢什么,我无法弄明白。

我的管道:

feats = FeatureUnion([('age', age),
                      ('education_num', education_num),
                      ('is_education_favo', is_education_favo),
                      ('is_marital_status_favo', is_marital_status_favo),
                      ('hours_per_week', hours_per_week),
                      ('capital_diff', capital_diff),
                      ('sex', sex),
                      ('race', race),
                      ('native_country', native_country)
                     ])

pipeline = Pipeline([
        ('adhocFC',AdHocFeaturesCreation()),
        ('imputers', KnnImputer(target = 'native-country', n_neighbors = 5)),
        ('features',feats),('clf',LogisticRegression())])

我的网格搜索:

hyperparameters = {'imputers__n_neighbors' : [5,21,41], 'clf__C' : [1.0, 2.0]}

GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, scoring = 'roc_auc' , refit = False) #change n_jobs = 2, refit = False

GSCV.fit(X_train, y_train)

我收到11个类似的警告:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11:SettingWithCopyWarning:尝试在DataFrame的切片副本上设置一个值。尝试使用.loc [row_indexer,col_indexer] = value

这是错误消息:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11:SettingWithCopyWarning:尝试在DataFrame的切片副本上设置一个值。尝试使用.loc [row_indexer,col_indexer] = value

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:12:SettingWithCopyWarning:尝试在副本上设置值来自DataFrame的切片。尝试使用.loc [row_indexer,col_indexer] = value

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:14:SettingWithCopyWarning:尝试在副本上设置值来自DataFrame的切片。尝试使用.loc [row_indexer,col_indexer] = value

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

-------------------------------------------------- ------------------------- ValueError Traceback(最近一次调用last)in()3 GSCV = GridSearchCV(管道,超参数,cv = 3,得分) ='roc_auc',refit = False)#change n_jobs = 2,refit = False 4 ----> 5 GSCV.fit(X_train,y_train)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py​​ in fit(self,X,y,groups)943 train / test set。 944“”“ - > 945返回self._fit(X,y,groups,ParameterGrid(self.param_grid))946 947

/ home中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py​​(self,X,y,groups,parameter_iterable)562 return_times = True,return_parameters = True, 563 error_score = self.error_score) - > 564 for parameter_iterable 565 for train,test in cv_iter)566

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call(self,iterable)756#已被派遣。特别是它覆盖了与耗尽的迭代器一起使用的757#Parallel的边缘。 - > 758 while self.dispatch_one_batch(iterator):759 self._iterating = True 760 else:

dispale_one_batch(self,iterator)中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py 606返回False 607 else: - > 608 self._dispatch (任务)609返回True 610

_hispatch中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py(self,batch)569 dispatch_timestamp = time.time()570 cb = BatchCompletionCallBack( dispatch_timestamp,len(batch),self) - > 571 job = self._backend.apply_async(batch,callback = cb)572 self._jobs.append(job)573

apply_async中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py(self,func,callback)107 def apply_async(self,func,callback = None ):108“”“安排一个要运行的命令”“” - > 109结果= ImmediateResult(func)110如果回调:111回调(结果)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in init(self,batch)324#不要拖延应用程序,以避免保留在内存中输入325个参数 - > 326 self.results = batch()327 328 def get(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call(self)129 130 def call(self): - > 131 return [func (* args,** kwargs)for func,args,kwargs in self.items] 132 133 def len(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in(.0)129 130 def call(self): - > 131 return [func (* args,** kwargs)for func,args,kwargs in self.items] 132 133 def len(self):

_fit_and_score中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_validation.py(估算器,X,y,记分员,火车,测试,详细,参数,fit_params,return_train_score ,return_parameters,return_n_test_samples,return_times,error_score)236 estimator.fit(X_train,** fit_params)237 else: - > 238 estimator.fit(X_train,y_train,** fit_params)239 240除了Exception as e:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self,X,y,** fit_params)266此估算器267“”“ - > 268 Xt,fit_params = self._fit(X,y,** fit_params)269如果self._final_estimator不是None:270 self._final_estimator.fit(Xt,y,** fit_params)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in _fit(self,X,y,** fit_params)232 pass 233 elif hasattr(transform,“fit_transform” ): - > 234 Xt = transform.fit_transform(Xt,y,** fit_params_steps [name])235 else:236 Xt = transform.fit(Xt,y,** fit_params_steps [name])\

fit_transform中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/base.py(self,X,y, ** fit_params)495 else:496 #fit arity 2的方法(监督转换) - > 497 return self.fit(X,y,** fit_params).transform(X)498 499

in fit(self,X,y)16 self.ohe.fit(X_full)17#创建一个不包含任何空值的数据帧,categories变量是OHE,所有每一行---> 18 X_ohe_full = self.ohe。 transform(X_full [~X [self.col] .isnull()]。drop(self.col,axis = 1))19 20#在col为null的行上放置分类器

getitem中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py(self,key)2057返回self._getitem_multilevel(key)2058 else: - > 2059 return self._getitem_column(key)2060 2061 def _getitem_column(self,key):

_getitem_column中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py(self,key)2064#get column 2065 if self.columns.is_unique: - > 2066返回self._get_item_cache(key)2067 2068#重复列并可能降低维度

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self,item)1384 res = cache.get(item) 1385如果res为None: - > 1386 values = self._data.get(item)1387 res = self._box_item_values(item,values)1388 cache [item] = res

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in get(self,item,fastpath)3550 loc = indexer.item()3551 else: - > 3552引发ValueError(“无法用空键标记索引”)3553 3554返回self.iget(loc,fastpath = fastpath)

ValueError:无法使用null键标记索引

python pandas scikit-learn pipeline
1个回答
0
投票

没有其他信息我相信这是因为你的X_trainy_train变量是pandas数据帧,基本的sci-kit学习库与这些不可比:例如,分类器的.fit方法期望像对象这样的数组。

通过输入pandas数据帧,你无意中将它们像numpy数组一样索引,这在pandas中并不稳定。

尝试将训练数据转换为numpy数组:

X_train_arr = X_train.to_numpy()
y_train_arr = y_train.to_numpy()
© www.soinside.com 2019 - 2024. All rights reserved.