工作管道上的GridSearchCV返回ValueError

Question

我正在使用GridSearchCV来查找管道的最佳参数。

我的管道似乎运作良好，因为我可以申请：

pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)

而且我得到了不错的结果。

但GridSearchCV显然不喜欢什么，我无法弄明白。

我的管道：

feats = FeatureUnion([('age', age),
                      ('education_num', education_num),
                      ('is_education_favo', is_education_favo),
                      ('is_marital_status_favo', is_marital_status_favo),
                      ('hours_per_week', hours_per_week),
                      ('capital_diff', capital_diff),
                      ('sex', sex),
                      ('race', race),
                      ('native_country', native_country)
                     ])

pipeline = Pipeline([
        ('adhocFC',AdHocFeaturesCreation()),
        ('imputers', KnnImputer(target = 'native-country', n_neighbors = 5)),
        ('features',feats),('clf',LogisticRegression())])

我的网格搜索：

hyperparameters = {'imputers__n_neighbors' : [5,21,41], 'clf__C' : [1.0, 2.0]}

GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, scoring = 'roc_auc' , refit = False) #change n_jobs = 2, refit = False

GSCV.fit(X_train, y_train)

我收到11个类似的警告：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11:SettingWithCopyWarning：尝试在DataFrame的切片副本上设置一个值。尝试使用.loc [row_indexer，col_indexer] = value

这是错误消息：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11:SettingWithCopyWarning：尝试在DataFrame的切片副本上设置一个值。尝试使用.loc [row_indexer，col_indexer] = value

请参阅文档中的警告：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:12:SettingWithCopyWarning：尝试在副本上设置值来自DataFrame的切片。尝试使用.loc [row_indexer，col_indexer] = value

请参阅文档中的警告：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:14:SettingWithCopyWarning：尝试在副本上设置值来自DataFrame的切片。尝试使用.loc [row_indexer，col_indexer] = value

请参阅文档中的警告：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

-------------------------------------------------- ------------------------- ValueError Traceback（最近一次调用last）in（）3 GSCV = GridSearchCV（管道，超参数，cv = 3，得分） ='roc_auc'，refit = False）#change n_jobs = 2，refit = False 4 ----> 5 GSCV.fit（X_train，y_train）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py in fit（self，X，y，groups）943 train / test set。 944“”“ - > 945返回self._fit（X，y，groups，ParameterGrid（self.param_grid））946 947

/ home中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py(self,X,y,groups,parameter_iterable）562 return_times = True，return_parameters = True， 563 error_score = self.error_score） - > 564 for parameter_iterable 565 for train，test in cv_iter）566

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call（self，iterable）756＃已被派遣。特别是它覆盖了与耗尽的迭代器一起使用的757＃Parallel的边缘。 - > 758 while self.dispatch_one_batch（iterator）：759 self._iterating = True 760 else：

dispale_one_batch（self，iterator）中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py 606返回False 607 else： - > 608 self._dispatch （任务）609返回True 610

_hispatch中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py（self，batch）569 dispatch_timestamp = time.time（）570 cb = BatchCompletionCallBack（ dispatch_timestamp，len（batch），self） - > 571 job = self._backend.apply_async（batch，callback = cb）572 self._jobs.append（job）573

apply_async中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py（self，func，callback）107 def apply_async（self，func，callback = None ）：108“”“安排一个要运行的命令”“” - > 109结果= ImmediateResult（func）110如果回调：111回调（结果）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in init（self，batch）324＃不要拖延应用程序，以避免保留在内存中输入325个参数 - > 326 self.results = batch（）327 328 def get（self）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call（self）129 130 def call（self）： - > 131 return [func （* args，** kwargs）for func，args，kwargs in self.items] 132 133 def len（self）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in（.0）129 130 def call（self）： - > 131 return [func （* args，** kwargs）for func，args，kwargs in self.items] 132 133 def len（self）：

_fit_and_score中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_validation.py（估算器，X，y，记分员，火车，测试，详细，参数，fit_params，return_train_score ，return_parameters，return_n_test_samples，return_times，error_score）236 estimator.fit（X_train，** fit_params）237 else： - > 238 estimator.fit（X_train，y_train，** fit_params）239 240除了Exception as e：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in fit（self，X，y，** fit_params）266此估算器267“”“ - > 268 Xt，fit_params = self._fit（X，y，** fit_params）269如果self._final_estimator不是None：270 self._final_estimator.fit（Xt，y，** fit_params）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in _fit（self，X，y，** fit_params）232 pass 233 elif hasattr（transform，“fit_transform” ）： - > 234 Xt = transform.fit_transform（Xt，y，** fit_params_steps [name]）235 else：236 Xt = transform.fit（Xt，y，** fit_params_steps [name]）\

fit_transform中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/base.py(self,X,y, ** fit_params）495 else：496 #fit arity 2的方法（监督转换） - > 497 return self.fit（X，y，** fit_params）.transform（X）498 499

in fit（self，X，y）16 self.ohe.fit（X_full）17＃创建一个不包含任何空值的数据帧，categories变量是OHE，所有每一行---> 18 X_ohe_full = self.ohe。 transform（X_full [~X [self.col] .isnull（）]。drop（self.col，axis = 1））19 20＃在col为null的行上放置分类器

getitem中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py（self，key）2057返回self._getitem_multilevel（key）2058 else： - > 2059 return self._getitem_column（key）2060 2061 def _getitem_column（self，key）：

_getitem_column中的/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py(self，key）2064＃get column 2065 if self.columns.is_unique： - > 2066返回self._get_item_cache（key）2067 2068＃重复列并可能降低维度

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache（self，item）1384 res = cache.get（item） 1385如果res为None： - > 1386 values = self._data.get（item）1387 res = self._box_item_values（item，values）1388 cache [item] = res

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in get（self，item，fastpath）3550 loc = indexer.item（）3551 else： - > 3552引发ValueError（“无法用空键标记索引”）3553 3554返回self.iget（loc，fastpath = fastpath）

ValueError：无法使用null键标记索引

Answer 1

没有其他信息我相信这是因为你的X_train和y_train变量是pandas数据帧，基本的sci-kit学习库与这些不可比：例如，分类器的.fit方法期望像对象这样的数组。

通过输入pandas数据帧，你无意中将它们像numpy数组一样索引，这在pandas中并不稳定。

尝试将训练数据转换为numpy数组：

X_train_arr = X_train.to_numpy()
y_train_arr = y_train.to_numpy()

工作管道上的GridSearchCV返回ValueError

问题描述投票：0回答：1

1个回答

最新问题

工作管道上的GridSearchCV返回ValueError

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1