我想知道一个序列中每个碱基的相对数量。结果应该显示在一个列表中。这是我的尝试。
def get_freqs(Sequ):
rel_Anz=[]
laenge = len(Sequ)
A_freq = (Sequ.count('A')/ laenge)
T_freq = (Sequ.count('T')/ laenge)
C_freq = (Sequ.count('C')/ laenge)
G_freq = (Sequ.count('G')/ laenge)
rel_Anz= [A_freq, T_freq, C_freq, G_freq]
return rel_Anz
print("The frequence of each base (A,T,C,G) is ", rel_Anz)
get_freqs (ATTAAACC)
我不知道该如何包含我想计算的序列。我应该在之前定义它吗?
我强烈建议使用 Counter()!
from collections import Counter
sequence = 'ATCGACTAGCATCGACTACATCACTAC'
c = Counter(sequence)
print(c)
l = len(sequence)
for k,v in c.items():
print(f'{k} frequency is: {v/l}')
!"输出"。
Counter({'A': 9, 'C': 9, 'T': 6, 'G': 3})
A frequency is: 0.3333333333333333
T frequency is: 0.2222222222222222
C frequency is: 0.3333333333333333
G frequency is: 0.1111111111111111
把它包装成一个函数
def get_freq(sequence):
c = Counter(sequence.upper())
l = len(sequence)
result = {}
for k,v in c.items():
result.update({k: round(v/l, 2)})
return result
get_freq('ATTAAACC')
{'A': 0.5, 'T': 0.25, 'C': 0.25}
首先, 假设你想把一个核苷酸序列传给函数, 你可能想把它作为一个字符串来传, 所以它看起来像这样:
get_freqs ('ATTAAACC')
或者像这样
get_freqs ("ATTAAACC")
第二,你在打印结果之前先返回。
return rel_Anz
print("The frequence of each base (A,T,C,G) is ", rel_Anz)
return之后的每条语句都不会被执行,所以应该是这样。
print("The frequence of each base (A,T,C,G) is ", rel_Anz)
return rel_Anz
最后,像这样的东西应该是可行的。
def get_freqs(Sequ):
rel_Anz=[]
laenge = len(Sequ)
A_freq = (Sequ.count('A')/ laenge)
T_freq = (Sequ.count('T')/ laenge)
C_freq = (Sequ.count('C')/ laenge)
G_freq = (Sequ.count('G')/ laenge)
rel_Anz= [A_freq, T_freq, C_freq, G_freq]
print("The frequence of each base (A,T,C,G) is ", rel_Anz)
return rel_Anz
get_freqs ('ATTAAACC')
如果你想让它更清晰,更pythonic的话,
def get_freqs(seq):
length = len(seq)
a_freq = seq.count('A')/ length
t_freq = seq.count('T')/ length
c_freq = seq.count('C')/ length
g_freq = seq.count('G')/ length
return [a_freq, t_freq, c_freq, g_freq]
relative_frequencies = get_freqs ('ATTAAACC')
print("The frequence of each base (A,T,C,G) is ", relative_frequencies)
或者更密集一些:
def get_freqs(seq):
return [seq.count(nucl)/len(seq) for nucl in 'ATCG']
relative_frequencies = get_freqs('ATTAAACC')
print("The frequence of each base (A,T,C,G) is ", relative_frequencies)