Scikit学习中的分层GroupShuffleSplit

问题描述 投票:2回答:1

我想问问是否可以在scikit-learn中进行“ Strategy GroupShuffleSplit”,换句话说,就是GroupShuffleSplitStratifiedShuffleSplit的组合>

这里是我使用的代码示例:

cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\
    train_size=train_size,random_state=random_state).split(\
    allr_sets_nor[:,:2],allr_labels,groups=allr_groups)
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\
    param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)

[我在这里应用了GroupShuffleSplit,但我仍然想根据allr_labels添加启动项

我想问问是否有可能在scikit-learn中执行“ Stratified GroupShuffleSplit”,换句话说,这是GroupShuffleSplit和StratifiedShuffleSplit的组合。这里是...的示例。] >>

我通过在组上应用StratifiedShuffleSplit来解决问题,然后手动找到训练和测试集索引,因为它们链接到组索引(在我的情况下,每个组包含从6*index6*index+5的6个连续集)
如下所示:
sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size, train_size=train_size,random_state=random_state).split(all_groups,all_labels) # startified splitting for groups only i=0 train_is = [np.array([],dtype=int)]*n_splits test_is = [np.array([],dtype=int)]*n_splits for train_index,test_index in sss : # finding the corresponding indices of reflected training and testing sets train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)]))) test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)]))) i=i+1 cv=[(train_is[i],test_is[i]) for i in range(n_splits)] # constructing the final cross-validation iterable: list of 'n_splits' tuples; # each tuple contains two numpy arrays for training and testing indices respectively opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid, scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) opt.fit(allr_sets_nor[:,:2],allr_labels)
python scikit-learn dataset shuffle cross-validation
1个回答
4
投票
sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size, train_size=train_size,random_state=random_state).split(all_groups,all_labels) # startified splitting for groups only i=0 train_is = [np.array([],dtype=int)]*n_splits test_is = [np.array([],dtype=int)]*n_splits for train_index,test_index in sss : # finding the corresponding indices of reflected training and testing sets train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)]))) test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)]))) i=i+1 cv=[(train_is[i],test_is[i]) for i in range(n_splits)] # constructing the final cross-validation iterable: list of 'n_splits' tuples; # each tuple contains two numpy arrays for training and testing indices respectively opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid, scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) opt.fit(allr_sets_nor[:,:2],allr_labels)
© www.soinside.com 2019 - 2024. All rights reserved.