在建模过程中
for i in range(30,90,10):
if i==40: # Since there is no maximum speed limit of 40 in the test data, it is omitted.
continue
else:
globals()[f'train_{i}'] = train.loc[train['maximum_speed_limit']==i]
之后..在预测过程中
for i in range(30,90,10):
if i==40:
continue
elif i==30:
n_train = eval(f'X_train_{i}')
n_y_train = y_train[eval(f'X_train_{i}').index]
globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=9).fit(n_train, n_y_train)
elif i==50:
n_train = eval(f'X_train_{i}')
n_y_train = y_train[eval(f'X_train_{i}').index]
globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=12).fit(n_train, n_y_train)
elif i==60:
n_train = eval(f'X_train_{i}')
n_y_train = y_train[eval(f'X_train_{i}').index]
globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=12).fit(n_train, n_y_train)
elif i==70:
n_train = eval(f'X_train_{i}')
n_y_train = y_train[eval(f'X_train_{i}').index]
globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=11).fit(n_train, n_y_train)
elif i==80:
n_train = eval(f'X_train_{i}')
n_y_train = y_train[eval(f'X_train_{i}').index]
globals()[f'xgb_{i}'] = XGBRegressor(random_state=random_state, max_depth=10).fit(n_train, n_y_train)
有人在建模过程中这样做了。这不是泄露测试数据吗?我很好奇这是否被允许。
(做预测的时候,如果用continue连续跳过i==40,用不同的值(比如i==30,i==50...)来设置max_depth,这样也对吗?)