Julia 中用于解决 Rosalind 问题“开放阅读框架”的对象类型操作问题

问题描述 投票:0回答:1

代码上下文:我正在尝试解决罗莎琳德问题“开放阅读框架”(https://rosalind.info/problems/orf/)。我使用的方法是将每个 ORF 结果存储在一个名为“orf”(字符向量)的变量中,最后将它们分配到名为 Proteins(字符串向量)的最终结果向量中。

错误输出:

ERROR: LoadError: MethodError: Cannot `convert` an object of type String to an object of type Char

Closest candidates are:
  convert(::Type{T}, ::Number) where T<:AbstractChar
   @ Base char.jl:184
  convert(::Type{T}, ::T) where T<:AbstractChar
   @ Base char.jl:187
  convert(::Type{T}, ::AbstractChar) where T<:AbstractChar
   @ Base char.jl:186
  ...

Stacktrace:
 [1] push!(a::Vector{Char}, item::String)
   @ Base ./array.jl:1060
 [2] find_orfs(sequence::String)
   @ Main ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:66
 [3] top-level scope
   @ ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:111

代码:


#=
The following code is proposed to complete the Rosalind activity "Open Reading Frames".
Given any DNA sequence, the algorithm should be capable to identify protein sequences that starts in an start codon and finishes at stop codon.
It should be achieved by navigating through the original and complementary sequences and accessing the 3 reading frames in each.
If the start codon is identified, the code should insert a 'M' into the sequence and consequently add the amino acids that corresponds to each codon.
Finally, the Protein vector of Strings should contain all possibilities of proteins.
=#

#Function to get the complementary strand given the DNA sequence
function reverse_complement(x::String)
    comp = Vector{Char}()
    for i in x
        if i == 'A'
            push!(comp, 'T')
        elseif i == 'T'
            push!(comp, 'A')
        elseif i == 'C'
            push!(comp, 'G')
        elseif i == 'G'
            push!(comp, 'C')
        end
    end
    return join(reverse(comp))
end

function find_orfs(sequence::String)

    #Creation of Codon dictionary
    codon_table = Dict(
        "TTT" => "F", "TTC" => "F",
        "TTA" => "L", "TTG" => "L", "CTT" => "L", "CTC" => "L", "CTA" => "L", "CTG" => "L",
        "ATT" => "I", "ATC" => "I", "ATA" => "I",
        "ATG" => "M",
        "GTT" => "V", "GTC" => "V", "GTA" => "V", "GTG" => "V",
        "TCT" => "S", "TCC" => "S", "TCA" => "S", "TCG" => "S", "AGT" => "S", "AGC" => "S",
        "CCT" => "P", "CCC" => "P", "CCA" => "P", "CCG" => "P",
        "ACT" => "T", "ACC" => "T", "ACA" => "T", "ACG" => "T",
        "GCT" => "A", "GCC" => "A", "GCA" => "A", "GCG" => "A",
        "TAT" => "Y", "TAC" => "Y",
        "TAA" => "STOP", "TAG" => "STOP", "TGA" => "STOP",
        "CAT" => "H", "CAC" => "H",
        "CAA" => "Q", "CAG" => "Q",
        "AAT" => "N", "AAC" => "N",
        "AAA" => "K", "AAG" => "K",
        "GAT" => "D", "GAC" => "D",
        "GAA" => "E", "GAG" => "E",
        "TGT" => "C", "TGC" => "C",
        "TGG" => "W",
        "CGT" => "R", "CGC" => "R", "CGA" => "R", "CGG" => "R", "AGA" => "R", "AGG" => "R",
        "GGT" => "G", "GGC" => "G", "GGA" => "G", "GGG" => "G"
    )  
   
    Proteins = Vector{String}()


    # Consider all three forward reading frames
        frames = [1, 2, 3]
    for j in frames
            for i in j:3:length(sequence) - 2
                orf = Vector{Char}()    #Vetor
                codon = sequence[i:i+2]

                if haskey(codon_table, codon)
                    amino_acid = codon_table[codon]
                    if amino_acid == "M" 
                        push!(orf, amino_acid)
                        continue  
                    elseif amino_acid == "STOP"
                        push!(Proteins, orf)
                        break  
                    else
                        push!(orf, amino_acid)
                    end
                else
                    error("Invalid codon: $codon")
                end
        end
    end


    comp_seq = reverse_complement(sequence)

    # Consider all three reverse reading frames
    for j in frames
            for i in j:3:length(comp_seq) - 2
                orf = Vector{Char}()
                codon = comp_seq[i:i+2]

                if haskey(codon_table, codon)
                    amino_acid = codon_table[codon]
                    if amino_acid == "M"
                        push!(orf, amino_acid)
                        continue  
                    elseif amino_acid == "STOP"
                        push!(Proteins, orf)
                        break  
                    else
                        push!(orf, amino_acid)
                    end
                else
                    error("Invalid codon: $codon")
                end
        end
    end

    return Proteins
end


sequence = "ATGGCCATGGCGCCCAGAACTGAGATCAATAGTACCCGTATAACGGGTGA"
result = find_orfs(sequence)
println(result)

我已经尝试过将“orf”类型更改为 String 或 AbstractString,但也不起作用。 甚至尝试调整代码以将每个氨基酸键存储在向量“orf”中,然后将其内容用作字符串以作为参数放入推送中!功能,但没有积极的结果。

julia bioinformatics rosalind
1个回答
0
投票

如堆栈跟踪所示:

Stacktrace:
 [1] push!(a::Vector{Char}, item::String)
   @ Base ./array.jl:1060
 [2] find_orfs(sequence::String)
   @ Main ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:66

问题是您正在尝试将

push!
字符串 (
item::String
) 转换为字符向量。在 Julia 中,双引号创建字符串,而单引号创建
Char
值。所以
codon_table
的值一侧的氨基酸都是字符串,而不是字符。

由于您的目标是将蛋白质作为字符串数组,因此这里最简单的选择是将

orf
更改为
String
,并将氨基酸连接到它(正如 Dan 建议的那样)。因此,将
orf
初始化为
orf = ""
(一个空字符串),然后使用
push!(orf, amino_acid)
代替
orf *= amino_acid
行,它是
orf = orf * amino_acid
的简写(
*
运算符将现有
orf
与新的
amino_acid
字符串)。

还要注意的一点是,由于它存在,此代码在每个氨基酸之后将

orf
重置为空,因此蛋白质字符串最终总是空的。设置
orf
的初始值应在
for i in j:3:length(comp_seq) - 2
循环外部完成,而不是在循环内部完成,以避免出现此问题。

(我相信这段代码在未正确处理密码子的开头方面也有一个逻辑错误,但我将把它留给你来修复,因为这是挑战的一部分。)

© www.soinside.com 2019 - 2024. All rights reserved.