我有一个矢量化的定义:
def selection_update_weights(df):
# Define the selections for 'Win'
selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5",
"W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5",
"W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5",
"W (untested)", "W"]
# Create a boolean mask for the condition for 'Win'
mask_win = (df['selection_match'] == "no match") & \
(df['selection'].isin(selections_win)) & \
(df['result_match'] == "no match") & \
(df['result'] != 'draw')
# Apply the condition and update the 'Win' column
df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02
# Define the selections for 'DNB'
selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
"DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]
# Create a boolean mask for the condition for 'DNB'
mask_DNB = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_DNB)) & \
(df['result_match'] == 'no match') & \
(df['result'] != 'draw'))
# Apply the condition and update the 'DNB' column
df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02
# Define the selections for O 1.5'
selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
"W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]
# Create a boolean mask for the condition for 'O 1.5'
mask_O_1_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_1_5)) & \
(df['total_score'] < 2))
# Apply the condition and update the 'O 1.5' column
df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02
# Define the selections for O 2.5'
selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)",
"W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
"DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_O_2_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_2_5)) & \
(df['total_score'] < 3))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02
# Define the selections for U 4.5'
selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
"W & U 4.5", "U 4.5 (untested)", "U 4.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_U_4_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_U_4_5)) & \
(df['total_score'] > 4))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02
return df
首次运行作品:
但是后续循环不会产生任何变化。
虽然我有一个非常大的数据框,但列会部分更新。我也不知道为什么。
原始数据框不受影响。
如果我分解每个 if-else 但数据框太大并且行计算需要 20 分钟会有帮助吗
我通过以下方式应用它:
df = selection_update_weights(df)
首次运行作品:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 2 0 2 2 12.370528 12.090888 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
9 2 0 2 2 11.439416 10.291339 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
10 2 0 2 2 11.226599 10.228954 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
11 1 5 6 4 12.069979 10.194557 away home no match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
20 2 0 2 2 9.808659 9.049657 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
当我运行 def 时
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.000000 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.000000 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
但是后续循环不会产生任何变化。
虽然我有一个非常大的数据框,但这个片段是权重没有更新的地方。权重部分更新。我也不知道为什么。
df.head():
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 1.1 0.7 2.0 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.1 0.7 2.0 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 1.1 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.1 0.7 2.0 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.1 0.7 2.0 8.119853 4.0 W & O 1.5 (both untested) no match
所以当我应用它时:
df = selection_update_weights(df)
理想情况下我应该得到
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.0 3 4.0 W & O 2.5 (both untested) no match
1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
然而,这并没有发生,原始数据帧不受影响。
如果我分解每个 if-else 但数据框太大并且行计算需要 20 分钟会有帮助吗
在第二个循环中,仅更新 Win 和 O_2_5 列。 Win 根据预测得分差值的函数进行更新 O_2_5 根据预测总得分值的函数进行更新
predicted_score_difference 和predicted_total_score 在selection_update_weights 中永远不会改变,因此我们可以将predicted_score_difference 和predicted_total_score 视为selection_update_weights 中的常量。由于您多次调用一个方法并根据常量更新其值,因此它们永远不会有另一个值。
我不确定你为什么要多次调用selection_update_weights,但也许你应该更新Win、O_2_5(以及其余其他列)本身,或者更新selection_update_weights函数中的predicted_score_difference和predicted_total_score