Python 中反转 DNA/mRNA 序列的问题

问题描述 投票:0回答:1

我无法让我的程序在所有情况下运行,任何具有生物学和编码专业知识的人都应该能够告诉我哪里出了问题。我正在尝试创建一个程序来询问一些有关生物分子的问题。首先,它会询问用户 DNA/mRNA 链是否处于 5' 到 3' 方向。然后它会询问该分子是DNA还是RNA。如果是 DNA,它会询问我们是否要读取模板或编码链以找到生成的 mRNA。然后程序将读取 mRNA 分子并从 5' 到 3' 方向读取并确定氨基酸序列。问题在于,无论方向如何,该程序似乎都适用于 mRNA,但在以某些方向读取 DNA 分子时会中断。我附上了一张显示一些示例的图片,以便您可以看到它失败的地方。我正在尝试从所有 6 个条件中获取正确的 MetPheIle 氨基酸序列。我还会在概述中附上一张粗略的图片,以防代码令人困惑。

这是我的代码:

#rules for converting any DNA strand to its complementary RNA strand

def complement_base(base):
    if base == 'A':
        return 'U'
    elif base == 'T':
        return 'A'
    elif base == 'C':
        return 'G'
    elif base == 'G':
        return 'C'
    else:
        return ''


#converts dna strands to mRNA so they can be transcribed

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    if is_template_strand:
        mrna_strand = ''.join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace('T', 'U')
    return mrna_strand


#takes the mRNA sequence and sets rules for start and stop points and how to read the strand 

def translate_mrna_to_amino_acid(mrna_strand, codon_table):
    start_codon = "AUG"
    stop_codons = {"UAA", "UAG", "UGA"}

    amino_acid_sequence = ""
    translating = False

    index = 0
    while index < len(mrna_strand):
        codon = mrna_strand[index:index + 3]

        if codon == start_codon:
            translating = True

        if translating:
            if codon in stop_codons:
                break

            amino_acid = codon_table.get(codon, "-")
            amino_acid_sequence += amino_acid

        index += 3

    return amino_acid_sequence


#function for getting the mRNA sequence by input type, and reversing it if it is initially in the 3' to 5' direction because the RNA is translated in the 5' to 3' direction

def get_mrna_sequence(reverse_sequence=False):
    valid_bases = {'A', 'U', 'C', 'G'}

    while True:
        option = input("Enter '1' to input mRNA sequence manually, '2' to upload a file: ")

        if option == "1":
            mrna_sequence = input("Enter the mRNA sequence (only A, U, C, G): ").upper()
            if all(base in valid_bases for base in mrna_sequence):
                if reverse_sequence:
                    mrna_sequence = mrna_sequence[::-1]  # Reverse the sequence
                return mrna_sequence
            else:
                print("Invalid sequence! Please use only A, U, C, and G.")

        elif option == "2":
            file_name = input("Enter the file name with the mRNA sequence: ")
            try:
                with open(file_name, "r") as file:
                    mrna_sequence = file.read().replace("\n", "").upper()
                    if all(base in valid_bases for base in mrna_sequence):
                        if reverse_sequence:
                            mrna_sequence = mrna_sequence[::-1]  # Reverse the sequence
                        return mrna_sequence
                    else:
                        print("Invalid sequence in file! Please use only A, U, C, and G.")
            except FileNotFoundError:
                print("File not found!")

        else:
            print("Invalid option!")

    return mrna_sequence


#similiar to the code above but is the DNA sequence that will be converted to mRNA when thymine is replaced with uracil 

def get_dna_sequence():
    valid_bases = {'A', 'T', 'C', 'G'}

    while True:
        option = input("Enter '1' to input DNA sequence manually, '2' to upload a file: ")

        if option == "1":
            dna_sequence = input("Enter the DNA sequence (only A, T, C, G): ").upper()
            if all(base in valid_bases for base in dna_sequence):
                return dna_sequence
            else:
                print("Invalid sequence! Please use only A, T, C, and G.")

        elif option == "2":
            file_name = input("Enter the file name with the DNA sequence: ")
            try:
                with open(file_name, "r") as file:
                    dna_sequence = file.read().replace("\n", "").upper()
                    if all(base in valid_bases for base in dna_sequence):
                        return dna_sequence
                    else:
                        print("Invalid sequence in file! Please use only A, T, C, and G.")
            except FileNotFoundError:
                print("File not found!")

        else:
            print("Invalid option!")


#the main function and codon table to translate sequence. It asks a few questions, 1)Is the molecule on the 5' to 3' direction? 2) is it RNA or DNA? 3) If DNA, is it the coding or template strand? 4) will the sequence be entered manually or a text file? 5) enter the dna sequence. And then is attempting to give the amino acid sequence from this. 

def main():
    direction = input("Is the molecule in the 5' to 3' direction? (yes/no): ").lower()
    molecule_type = input("Is the molecule DNA or RNA? ").lower()

    if molecule_type == 'dna':
        sequence_type = input("Is it the template strand or the coding strand? ").lower()
        if sequence_type == 'template strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=True, reverse_sequence=(direction == 'no'))
        elif sequence_type == 'coding strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=False)
        else:
            print("Invalid input!")

    elif molecule_type == 'rna':
        mrna_sequence = get_mrna_sequence(reverse_sequence=(direction == 'no'))

    else:
        print("Invalid input!")

    # Example codon table mapping
    codon_table = {
        "UUU": "Phe", "UUC": "Phe", "UUA": "Leu", "UUG": "Leu",
        "CUU": "Leu", "CUC": "Leu", "CUA": "Leu", "CUG": "Leu",
        "AUU": "Ile", "AUC": "Ile", "AUA": "Ile", "AUG": "Met",
        # ... other codons and their respective amino acids
    }

    if mrna_sequence:
        resulting_amino_acids = translate_mrna_to_amino_acid(mrna_sequence, codon_table)
        print("Resulting amino acid sequence:", resulting_amino_acids)

if __name__ == "__main__":
    main()


Schematuc TESTS

测试:


5' 3' Template AAATCAGATAAACAT -> metpheile FAIL
3'5' template TACAAATAGACTAAA -> metpheile PASS

5'3' coding ATGTTTATCTGATTT -> metpheile PASS
3'5' coding TTTAGTCTATTTGTA -> metpheile FAIL

5'3' mrna AUGUUUAUCUGAUUU -> metpheile PASS
3'S'mrna UUUAGUCUAUUUGUA -> metpheile  PASS

mRNA 测试有效,因此逆转 DNA 序列肯定存在一些问题。 5' 到 3' 编码链和 5' 到 3' mRNA 链应该是相同的,T 替换为 U。 3' 到 5' 编码链应该相反,T 替换为 U,但有些地方不对。我的代码,要么我没有正确反转链,要么我在错误的时间调用了错误的函数。我对此很陌生,所以我可能在如何反转和翻译方面遇到困难。 5' 到 3' 模板将在 3' 到 5' 方向给出最终的 mRNA 分子,我应该必须反转所得的 mRNA 链,你可以看到这个也失败了。 3' 到 5' 模板应该给出 5' 到 3' mRNA 链,而这个通过了,所以我推断它是反转功能的问题,但我不确定把它放在哪里。我尝试在 get_mRNA_sequence 函数下反转它,但失败了。我知道这很多,但我们将不胜感激。如果我对 DNA 或 RNA 的理解有任何问题,我也将不胜感激。谢谢!

python bioinformatics reverse dna-sequence
1个回答
0
投票

问题出在

convert_to_mrna
函数和
main
函数上。


main
函数中,这里有这两个条件

        if sequence_type == 'template strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=True, reverse_sequence=(direction == 'no'))
        elif sequence_type == 'coding strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=False)

无法区分

coding strand
方向的
5' -> 3'
coding strand
方向的
3' -> 5'
。为了区分两者,您可以将这两个条件重写为一个:

        if sequence_type in ("template strand", "coding strand"):
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(
                dna_sequence,
                is_template_strand=sequence_type == "template strand",
                reverse_sequence=(direction == "no"),
            )

convert_to_mrna
函数中,您当前根本没有使用
reverse_sequence
的参数:

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    if is_template_strand:
        mrna_strand = ''.join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace('T', 'U')
    return mrna_strand

对于

3' -> 5'
coding strand
5' -> 3'
template strand
,您需要进行 反向互补 转录 以获得准备用于氨基酸替换的
5' -> 3'
mRNA 序列。这意味着您必须在执行碱基补体替换之前或之后反转序列。最简单的方法是根据给定 DNA 序列的信息进行独占或 (
XOR
) 检查;
[whether to reverse the sequence] = [DNA is a template strand] XOR [DNA is in the 3' -> 5' direction]

XOR
在 Python 中实现,使用
^
运算符
给定 2 个布尔操作数,因此您只需要添加两行。下面,在转换为 mRNA 之前实现序列反转(您也可以在转换为 mRNA 后反转序列):

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    # `[whether to reverse the sequence] = [DNA is a template strand] XOR [DNA is in the 3' -> 5' direction]`
    if is_template_strand ^ reverse_sequence:
        dna_strand = dna_strand[::-1]
    if is_template_strand:
        mrna_strand = "".join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace("T", "U")
    return mrna_strand

无论如何,我会重新考虑如何命名其中一些参数和变量,因为当您想专注于开发正确的算法时,正确命名变量会很有帮助。

例如,我不会将该参数命名为

convert_to_mrna
reverse_sequence
;是否真正反转序列还取决于 DNA 序列是模板还是编码链。您应该将其命名为类似
is_3_to_5
的名称。

© www.soinside.com 2019 - 2024. All rights reserved.