我有以下数据框:
人 | 尝试 | 学习时间 | 数学 | 英语 | 科学 |
---|---|---|---|---|---|
史蒂夫 | [1,2,3,4,5] | [400,478,512,517,810] | [95,93,94,89,92] | [96,97,92,82,83] | [92,94,97,93,80] |
乔治 | [1,2] | [379,500] | [89,91] | [92,87] | [75,82] |
查尔斯 | [1] | [545] | [87] | [89] | [92] |
安迪 | [1, 2, 3] | [510,560,801] | [92,94,97] | [89,89,82] | [79,78,91] |
解释这张表,乔治考了两次。他在第一次考试中学习了 379 分钟,在第二次考试中学习了 500 分钟。他第一次尝试数学时获得 89 分,第二次尝试时获得 91 分。
我正在尝试计算每次尝试的权重。 公式如下:
我能够为这个公式生成工作代码:
for i,row in df.iterrows():
weights = []
max_mins = max(row['Studying Time'])
if row['Tries'][-1] == 5:
s1_decay = 0.7 ** 4
s2_decay = 0.7 ** 3
s3_decay = 0.7 ** 2
s4_decay = 0.7
s5_decay = 1
s1_percentmins = (row['Studying Time'][0]/max_mins)
s2_percentmins = (row['Studying Time'][1]/max_mins)
s3_percentmins = (row['Studying Time'][2]/max_mins)
s4_percentmins = (row['Studying Time'][3]/max_mins)
s5_percentmins = (row['Studying Time'][4]/max_mins)
s1_cumulative = s1_decay * s1_percentmins
s2_cumulative = s2_decay * s2_percentmins
s3_cumulative = s3_decay * s3_percentmins
s4_cumulative = s4_decay * s4_percentmins
s5_cumulative = s5_decay * s5_percentmins
sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative + s4_cumulative + s5_cumulative
s1_weight = s1_cumulative/sum_cumulative
s2_weight = s2_cumulative/sum_cumulative
s3_weight = s3_cumulative/sum_cumulative
s4_weight = s4_cumulative/sum_cumulative
s5_weight = s5_cumulative/sum_cumulative
weights.extend((s1_weight, s2_weight, s3_weight, s4_weight, s5_weight))
df.at[i, 'weights'] = weights
if row['Tries'][-1] == 4:
s1_decay = 0.7 ** 3
s2_decay = 0.7 ** 2
s3_decay = 0.7
s4_decay = 1
s1_percentmins = (row['Studying Time'][0]/max_mins)
s2_percentmins = (row['Studying Time'][1]/max_mins)
s3_percentmins = (row['Studying Time'][2]/max_mins)
s4_percentmins = (row['Studying Time'][3]/max_mins)
s1_cumulative = s1_decay * s1_percentmins
s2_cumulative = s2_decay * s2_percentmins
s3_cumulative = s3_decay * s3_percentmins
s4_cumulative = s4_decay * s4_percentmins
sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative +s4_cumulative
s1_weight = s1_cumulative/sum_cumulative
s2_weight = s2_cumulative/sum_cumulative
s3_weight = s3_cumulative/sum_cumulative
s4_weight = s4_cumulative/sum_cumulative
weights.extend((s1_weight, s2_weight, s3_weight, s4_weight))
df.at[i, 'weights'] = weights
if row['Tries'][-1] == 3:
s1_decay = 0.7 ** 2
s2_decay = 0.7
s3_decay = 1
s1_percentmins = (row['Studying Time'][0]/max_mins)
s2_percentmins = (row['Studying Time'][1]/max_mins)
s3_percentmins = (row['Studying Time'][2]/max_mins)
s1_cumulative = s1_decay * s1_percentmins
s2_cumulative = s2_decay * s2_percentmins
s3_cumulative = s3_decay * s3_percentmins
sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative
s1_weight = s1_cumulative/sum_cumulative
s2_weight = s2_cumulative/sum_cumulative
s3_weight = s3_cumulative/sum_cumulative
weights.extend((s1_weight, s2_weight, s3_weight))
df.at[i, 'weights'] = weights
if row['Tries'][-1] == 2:
s1_decay = 0.7
s2_decay = 1
s1_percentmins = (row['Studying Time'][0]/max_mins)
s2_percentmins = (row['Studying Time'][1]/max_mins)
s1_cumulative = s1_decay * s1_percentmins
s2_cumulative = s2_decay * s2_percentmins
sum_cumulative = s1_cumulative + s2_cumulative
s1_weight = s1_cumulative/sum_cumulative
s2_weight = s2_cumulative/sum_cumulative
weights.extend((s1_weight, s2_weight))
df6.at[i, 'weights'] = weights
if row['Tries'][-1] == 1:
s1_decay = 1
s1_percentmins = (row['Studying Time'][0]/max_mins)
s1_cumulative = s1_decay * s1_percentmins
sum_cumulative = s1_cumulative
s1_weight = s1_cumulative/sum_cumulative
weights.append(s1_weight)
df.at[i, 'weights'] = weights
你可以看出这个解决方案不是很 Pythonic。我想创建一个解决方案,我可以在其中编写一个通用函数,而不是为每个可能的尝试次数写出逻辑。
我已经开始尝试动态计算衰减。它只是根据尝试次数创建衰减变量的数量,但没有正确计算权重。
for i,row in df.iterrows():
vars = {f's{i+1}_decay': row['Tries'][i] for i in range(len(row['Tries']))}
print(vars)
关于如何将我的代码变成一个更通用的函数来解释每个可能的场景有什么想法吗?
谢谢。