我正在尝试将概率分配给字典键。词典如下
my_dict = {0: 21, 1: 36, 2: 13, 3: 344, 4: 171, 5: 10, 6: 7, 7: 24, 8: 15, 9: 14, 10: 77, 11: 7, 12: 434, 13: 6, 14: 38, 15: 328, 16: 149, 17: 12, 18: 67, 19: 85, 20: 33, 21: 19, 22: 13, 23: 477, 24: 9, 25: 206, 26: 226, 27: 48, 28: 135, 29: 42, 30: 273, 31: 11, 32: 61, 33: 11, 34: 378, 35: 32, 36: 10, 37: 237, 38: 248, 39: 64, 40: 7, 41: 74, 42: 17, 43: 30, 44: 12, 45: 44, 46: 197, 47: 314, 48: 118, 49: 40, 50: 89, 51: 6, 52: 260, 53: 18, 54: 5, 55: 5, 56: 5, 57: 455, 58: 25, 59: 23, 60: 70, 61: 179, 62: 98, 63: 9, 64: 163, 65: 102, 66: 8, 67: 188, 68: 5, 69: 500, 70: 8, 71: 142, 72: 216, 73: 6, 74: 299, 75: 286, 76: 56, 77: 156, 78: 123, 79: 58, 80: 27, 81: 20, 82: 93, 83: 29, 84: 361, 85: 26, 86: 15, 87: 396, 88: 112, 89: 415, 90: 46, 91: 53, 92: 16, 93: 6, 94: 81, 95: 22, 96: 129, 97: 51, 98: 35, 99: 107}
字典的结构是——key表示标签,values表示出现次数。我应该如何按照最低出现概率最高和最高出现概率最低的顺序为键分配概率?
我的代码:
sorted_dict = {key: value for key, value in sorted(dictionary.items(), key=lambda item: item[1])}
# Step 2: Calculate the probability of each key based on the sorted order
total_count = sum(sorted_dict.values())
probabilities = {key: (len(sorted_dict) - index)/total_count for index, (key, value) in enumerate(sorted_dict.items())}
# Print the probabilities
for key, probability in probabilities.items():
print(f"Key: {key}, Probability: {probability}")
print(sum(probabilities.values()))
问题是,概率加起来不等于 1。它甚至不接近 - 我得到概率总和 ~ 0.45
尝试改变
probabilities = {key: (len(sorted_dict) - index)/total_count for index, (key, value) in enumerate(sorted_dict.items())}
到
probabilities = {key: value/total_count for key, value in sorted_dict.items()}
创建反比例加权概率然后归一化:
total_count = sum(my_dict.values())
probabilities = {key: 1 - val/total_count for key, val in my_dict.items()}
prob_sum = sum(probabilities. values())
probabilities = {key: val/prob_sum for key, val in probabilities.items()}
print(sum(probabilities.values()))
# 0.9999999999999994
或者,反转出现次数(相对于总计数)并计算该数据的比例加权概率。
total_count = sum(my_dict.values())
inverted_count = sum(total_count - val for val in my_dict.values())
probabilities = {key: (total_count-val)/inverted_count for key, val in my_dict.items()}
print(sum(probabilities.values()))
# 1.0
让
total = sum(my_dict.values())
。每个键k
出现的概率是v/total
,其中v == my_dict[k]
.
为了使更可能发生的事件发生的可能性更小,您可以从 1 中减去每个概率,留下有效概率,但现在您的分布无效,因为新概率的总和不是 1,但是
n = len(my_dict.values())
sum(1 - v/total for v in my_dict.values())
== n - sum(v/total for v in my_dict.values())
== n - 1
所以,如果您将
1 - v/total
除以 n - 1
,您的新概率总和为 1,并且它们保持在 0 和 1 之间。
total = len(my_dict.values())
probabilities = {k: (1 - v/total)/(n - 1) for v in my_dict.values())