使用通用函数从方程动态创建变量

问题描述 投票:0回答:0

我有以下数据框:

尝试 学习时间 数学 英语 科学
史蒂夫 [1,2,3,4,5] [400,478,512,517,810] [95,93,94,89,92] [96,97,92,82,83] [92,94,97,93,80]
乔治 [1,2] [379,500] [89,91] [92,87] [75,82]
查尔斯 [1] [545] [87] [89] [92]
安迪 [1, 2, 3] [510,560,801] [92,94,97] [89,89,82] [79,78,91]

解释这张表,乔治考了两次。他在第一次考试中学习了 379 分钟,在第二次考试中学习了 500 分钟。他第一次尝试数学时获得 89 分,第二次尝试时获得 91 分。

我正在尝试计算每次尝试的权重。 公式如下:

  • 为每次尝试分配衰减。您上次尝试的价值为 1,上次尝试的价值为 (0.7),上次尝试的价值为 (0.7)^2,上次尝试的价值为 (0.7)^3,上次尝试的价值为 (0.7)^4。学生可以参加一次或最多 5 次考试。
  • 通过除以个人尝试的学习时间/学习时间最多的尝试来计算最大学习时间的百分比
  • 将尝试的衰减乘以尝试获得累积变量的总最大学习时间的百分比
  • 将一次尝试的累积变量除以所有尝试的累积次数之和

我能够为这个公式生成工作代码:

for i,row in df.iterrows():
    weights = []
    max_mins = max(row['Studying Time'])
    if row['Tries'][-1] == 5:
        s1_decay = 0.7 ** 4
        s2_decay = 0.7 ** 3
        s3_decay = 0.7 ** 2
        s4_decay = 0.7
        s5_decay = 1

        s1_percentmins = (row['Studying Time'][0]/max_mins)
        s2_percentmins = (row['Studying Time'][1]/max_mins)
        s3_percentmins = (row['Studying Time'][2]/max_mins)
        s4_percentmins = (row['Studying Time'][3]/max_mins)
        s5_percentmins = (row['Studying Time'][4]/max_mins)

        s1_cumulative = s1_decay * s1_percentmins
        s2_cumulative = s2_decay * s2_percentmins
        s3_cumulative = s3_decay * s3_percentmins
        s4_cumulative = s4_decay * s4_percentmins
        s5_cumulative = s5_decay * s5_percentmins

        sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative + s4_cumulative + s5_cumulative

        s1_weight = s1_cumulative/sum_cumulative
        s2_weight = s2_cumulative/sum_cumulative
        s3_weight = s3_cumulative/sum_cumulative
        s4_weight = s4_cumulative/sum_cumulative
        s5_weight = s5_cumulative/sum_cumulative

        weights.extend((s1_weight, s2_weight, s3_weight, s4_weight, s5_weight))
        df.at[i, 'weights'] = weights

    if row['Tries'][-1] == 4:
        s1_decay = 0.7 ** 3
        s2_decay = 0.7 ** 2
        s3_decay = 0.7
        s4_decay = 1

        s1_percentmins = (row['Studying Time'][0]/max_mins)
        s2_percentmins = (row['Studying Time'][1]/max_mins)
        s3_percentmins = (row['Studying Time'][2]/max_mins)
        s4_percentmins = (row['Studying Time'][3]/max_mins)

        s1_cumulative = s1_decay * s1_percentmins
        s2_cumulative = s2_decay * s2_percentmins
        s3_cumulative = s3_decay * s3_percentmins
        s4_cumulative = s4_decay * s4_percentmins

        sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative +s4_cumulative

        s1_weight = s1_cumulative/sum_cumulative
        s2_weight = s2_cumulative/sum_cumulative
        s3_weight = s3_cumulative/sum_cumulative
        s4_weight = s4_cumulative/sum_cumulative

        weights.extend((s1_weight, s2_weight, s3_weight, s4_weight))
        df.at[i, 'weights'] = weights


    if row['Tries'][-1] == 3:
        s1_decay = 0.7 ** 2
        s2_decay = 0.7
        s3_decay = 1

        s1_percentmins = (row['Studying Time'][0]/max_mins)
        s2_percentmins = (row['Studying Time'][1]/max_mins)
        s3_percentmins = (row['Studying Time'][2]/max_mins)

        s1_cumulative = s1_decay * s1_percentmins
        s2_cumulative = s2_decay * s2_percentmins
        s3_cumulative = s3_decay * s3_percentmins

        sum_cumulative = s1_cumulative + s2_cumulative + s3_cumulative

        s1_weight = s1_cumulative/sum_cumulative
        s2_weight = s2_cumulative/sum_cumulative
        s3_weight = s3_cumulative/sum_cumulative

        weights.extend((s1_weight, s2_weight, s3_weight))
        df.at[i, 'weights'] = weights

    if row['Tries'][-1] == 2:
        s1_decay = 0.7
        s2_decay = 1

        s1_percentmins = (row['Studying Time'][0]/max_mins)
        s2_percentmins = (row['Studying Time'][1]/max_mins)

        s1_cumulative = s1_decay * s1_percentmins
        s2_cumulative = s2_decay * s2_percentmins

        sum_cumulative = s1_cumulative + s2_cumulative

        s1_weight = s1_cumulative/sum_cumulative
        s2_weight = s2_cumulative/sum_cumulative
        weights.extend((s1_weight, s2_weight))

        df6.at[i, 'weights'] = weights

    if row['Tries'][-1] == 1:
        s1_decay = 1
        s1_percentmins = (row['Studying Time'][0]/max_mins)
        s1_cumulative = s1_decay * s1_percentmins
        sum_cumulative = s1_cumulative
        s1_weight = s1_cumulative/sum_cumulative
        weights.append(s1_weight)
        df.at[i, 'weights'] = weights

你可以看出这个解决方案不是很 Pythonic。我想创建一个解决方案,我可以在其中编写一个通用函数,而不是为每个可能的尝试次数写出逻辑。

我已经开始尝试动态计算衰减。它只是根据尝试次数创建衰减变量的数量,但没有正确计算权重。

for i,row in df.iterrows():
    vars = {f's{i+1}_decay': row['Tries'][i] for i in range(len(row['Tries']))}
    print(vars)

关于如何将我的代码变成一个更通用的函数来解释每个可能的场景有什么想法吗?

谢谢。

python data-manipulation
© www.soinside.com 2019 - 2024. All rights reserved.