下面的脚本为给定列表生成概率矩阵:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
输出
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
我现在想做相反的事情,并在概率矩阵之后创建一个新的A B C D过渡列表。我该如何做到这一点?
在我看来,您正在尝试创建马尔可夫模型。作为一名生物信息学专业的学生,我碰巧拥有(隐藏)马尔可夫模型的经验,因此我将使用嵌套字典来简化矩阵的使用。请注意,我已经导入了numpy.random
函数。
希望这会有所帮助!
import numpy.random as rnd
alphabet = ['A', 'B', 'C', 'D']
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
# Create probability matrix filled with zeroes
# Matrix consists of nested libraries
prob_matrix = {}
for i in alphabet:
prob_matrix[i] = {}
for j in alphabet:
prob_matrix[i][j] = 0.0
def rank(c):
return ord(c) - ord('A')
# fill matrix with numbers based on transitions list
T = [rank(c) for c in transitions]
for (i,j) in zip(T,T[1:]):
prob_matrix[alphabet[i]][alphabet[j]] += 1
# convert to probabilities
for row in prob_matrix:
total = sum([prob_matrix[row][column] for column in prob_matrix[row]])
if total > 0:
for column in prob_matrix[row]:
prob_matrix[row][column] /= total
# generate first random sequence letter
outputseq = rnd.choice(alphabet, None)
# generate rest of string based on probability matrix
for i in range(11):
probabilities = [prob_matrix[outputseq[-1]][j] for j in alphabet]
outputseq += rnd.choice(alphabet, None, False, probabilities)
# output generated sequence
print(outputseq)
随机库的choices
函数可能会有所帮助。由于该问题并不表示如何选择第一个字母,因此在这里选择它的可能性与原始列表的内容相同。
因为Python 3.6 choices
接受带有权重的参数。严格地不必标准化它们。
random.choices
完整的代码可以在某种程度上推广到与任何类型的节点一起使用,而不仅仅是连续的字母:
import random
letter = random.choice(transitions) # take a starting letter with the same weights as the original list
new_list = [letter]
for _ in range(len(transitions) - 1):
letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
new_list.append(letter)
print(new_list)
示例输出:from _collections import defaultdict
import random
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
nodes = sorted(set(transitions)) # a list of all letters used
M = defaultdict(int) # dictionary counting the occurrences for each transition i,j)
for (i, j) in zip(transitions, transitions[1:]):
M[(i, j)] += 1
# dictionary with for each node a list of frequencies for the transition to a next node
T = {i: [M[(i, j)] for j in nodes] for i in nodes}
# node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
node = random.choice(nodes) # chose the first node randomly, each node with equal probability
new_list = [node]
for _ in range(9):
node = random.choices(nodes, T[node])[0]
new_list.append(node)
print(new_list)