我想知道在建模过程中是否泄露了测试数据

问题描述 投票:0回答:0

在建模过程中

for i in range(30,90,10):
    if i==40: # Since there is no maximum speed limit of 40 in the test data, it is omitted.
        continue
    else:
        globals()[f'train_{i}'] = train.loc[train['maximum_speed_limit']==i]

之后..在预测过程中

for i in range(30,90,10):
    if i==40:
        continue
    elif i==30:
        n_train = eval(f'X_train_{i}')
        n_y_train = y_train[eval(f'X_train_{i}').index]
        globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=9).fit(n_train, n_y_train)
    elif i==50:
        n_train = eval(f'X_train_{i}')
        n_y_train = y_train[eval(f'X_train_{i}').index]
        globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=12).fit(n_train, n_y_train)
    elif i==60:
        n_train = eval(f'X_train_{i}')
        n_y_train = y_train[eval(f'X_train_{i}').index]
        globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=12).fit(n_train, n_y_train)
    elif i==70:
        n_train = eval(f'X_train_{i}')
        n_y_train = y_train[eval(f'X_train_{i}').index]
        globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=11).fit(n_train, n_y_train)
    elif i==80:
        n_train = eval(f'X_train_{i}')
        n_y_train = y_train[eval(f'X_train_{i}').index]
        globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=10).fit(n_train, n_y_train)
  1. 有人在建模过程中这样做了。这不是泄露测试数据吗?我很好奇这是否被允许。

  2. (做预测的时候,如果用continue连续跳过i==40,用不同的值(比如i==30,i==50...)来设置max_depth,这样也对吗?)

python modeling
© www.soinside.com 2019 - 2024. All rights reserved.