错误:在 StratifiedK-fold 期间出现“条件预期的布尔数组,而不是 float64”

问题描述 投票:0回答:1

我正在尝试使用 stratifid k-fold 对我的数据集进行交叉验证,但出现错误“条件应为布尔数组,而不是 float64”(在下面的标题代码中)。有谁知道原因吗?

这是代码:

import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from imblearn.over_sampling import SMOTE
cleanedDataset = `pd.read_csv("train_numeric_shuffled_50000_cleaned_90.csv")`

#providing input and output features
x=cleanedDataset.drop(['Id','Response'], axis=1)
y=cleanedDataset['Response']

#applico Stratified K-fold con K=4
skf = StratifiedKFold(n_splits=4)

#stampo risultati dei 4 fold
for i, (train_index, test_index) in enumerate(skf.split(x, y)):
        print(f"Fold {i}:")
        print(f"  Train: index={train_index}")
        print(f"  Test:  index={test_index}")

#uso la colonna response come Target
target = cleanedDataset.loc[:,'Response']

#definizione train model
model = LogisticRegression()
def train_model(train, test, fold_no):

    x_train = train[x]
    y_train = train[y]
    x_test = test[x]
    x_test = test[y]
    model.fit(X_train,y_train)
    predictions = model.predict(X_test)
    print('Fold',str(fold_no),'Accuracy:',accuracy_score(y_test,predictions))

#stampo valori accuratezza algoritmo
fold_no =1
for train_index, test_index in skf.split(cleanedDataset, target):
    train = cleanedDataset.loc[train_index,:]
    test = cleanedDataset.loc[test_index,:]
    train_model(train,test,fold_no)
    fold_no += 1

这是最后几行的错误回溯:

 ValueError  Traceback (most recent call last)
    ~\AppData\Local\Temp\ipykernel_8004\1316530102.py in <module>
          4     train = cleanedDataset.loc[train_index,:]
          5     test = cleanedDataset.loc[test_index,:]
    ----> 6     train_model(train,test,fold_no)
          7     fold_no += 1
    
    ~\AppData\Local\Temp\ipykernel_8004\3643313375.py in train_model(train, test, fold_no)
          3 def train_model(train, test, fold_no):
          4 
    ----> 5     X_train = train[x]
          6     y_train = train[y]
          7     X_test = test[x]

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3490         # Do we have a (boolean) DataFrame?
   3491         if isinstance(key, DataFrame):
-> 3492             return self.where(key)
   3493 
   3494         # Do we have a (boolean) 1d indexer?

~\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~\anaconda3\lib\site-packages\pandas\core\frame.py in where(self, cond, other, inplace, axis, level, errors, try_cast)
  10962         try_cast=lib.no_default,
  10963     ):
> 10964         return super().where(cond, other, inplace, axis, level, errors, try_cast)
  10965 
  10966     @deprecate_nonkeyword_arguments(

~\anaconda3\lib\site-packages\pandas\core\generic.py in where(self, cond, other, inplace, axis, level, errors, try_cast)
   9313             )
   9314 
-> 9315         return self._where(cond, other, inplace, axis, level, errors=errors)
   9316 
   9317     @doc(

~\anaconda3\lib\site-packages\pandas\core\generic.py in _where(self, cond, other, inplace, axis, level, errors)
   9074                 for dt in cond.dtypes:
   9075                     if not is_bool_dtype(dt):
-> 9076                         raise ValueError(msg.format(dtype=dt))
   9077         else:
   9078             # GH#21947 we have an empty DataFrame/Series, could be object-dtype

ValueError: Boolean array expected for the condition, not float64

我想修改什么?

python cross-validation
1个回答
0
投票

显然你的

x
y
在这里

#providing input and output features
x=cleanedDataset.drop(['Id','Response'], axis=1)
y=cleanedDataset['Response']

应该只包含列名,而不是列内容。需要改成

x = [c for c in cleanedDataset.columns if c not in {'Id','Response'}]
y = 'Response'

然后

train[x]
train[y]
等可以在数据框中采用正确的列
cleanedDataset
.

© www.soinside.com 2019 - 2024. All rights reserved.