在梯度推进分类使用sample_weight的

Question

我必须用于二进制分类问题的梯度推进分类下面的代码。

    import numpy as np
    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.metrics import confusion_matrix
    from sklearn.model_selection import train_test_split

    #Creating training and test dataset
    X_train, X_test, y_train, y_test =        
    train_test_split(X,y,test_size=0.30,random_state=1)

    #Count of goods in the training set
    #This count is 50000
    y0 = len(y_train[y_train['bad_flag'] == 0])

    #Count of bads in the training set
    #This count is 100
    y1 = len(y_train[y_train['bad_flag'] == 1])

    #Creating the sample_weights array. Include all bad customers and 
    #twice the number of goods as bads

    w0=(y1/y0)*2
    w1=1

    sample_weights = np.zeros(len(y_train))
    sample_weights[y_train['bad_flag'] == 0] = w0
    sample_weights[y_train['bad_flag'] == 1] = w1

    model=GradientBoostingClassifier(
    n_estimators=100,max_features=0.5,random_state=1)
    model=model.fit(X_train, y_train.values.ravel(),sample_weights)

我对编写这些代码思路如下： -

sample_weights将使model.fit从设定的训练，这同一套300个客户的选择所有100个劣品和200层的商品将被用来适应着分阶段时尚100个估计。我想我的undersample训练集，因为两个响应类是高度不平衡。请让我知道，如果我的代码的理解是正确的？
另外，我想确认n_estimators = 100意味着100点估计将适合在同一组300个客户的。这也意味着，有梯度推进分类引导不作为分类装袋可见。

Answer 1

据我了解，这是不是它是如何工作的。默认情况下，你必须GradientBoostingClassifier(subsample = 1.0)这意味着，将在每一个阶段（每个n_estimators的）中使用的样本量将是相同的原始数据集。权重将不会改变任何东西在子样本的大小。如果要强制每个阶段300个观察，你需要设置subsample = 300/(50000+100)除了重定义。
答案是不。对于每一个阶段，观察新的分数subsample会被吸引。你可以阅读更多关于它在这里：https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting。它说：在每次迭代时基分类器上的可用训练数据的一小部分的子样本训练。所以，作为一个结果，有一定的Bootstrap与提升算法相结合。

在梯度推进分类使用sample_weight的

问题描述投票：1回答：1

1个回答

最新问题

在梯度推进分类使用sample_weight的

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1