代码上下文:我正在尝试解决罗莎琳德问题“开放阅读框架”(https://rosalind.info/problems/orf/)。我使用的方法是将每个 ORF 结果存储在一个名为“orf”(字符向量)的变量中,最后将它们分配到名为 Proteins(字符串向量)的最终结果向量中。
错误输出:
ERROR: LoadError: MethodError: Cannot `convert` an object of type String to an object of type Char
Closest candidates are:
convert(::Type{T}, ::Number) where T<:AbstractChar
@ Base char.jl:184
convert(::Type{T}, ::T) where T<:AbstractChar
@ Base char.jl:187
convert(::Type{T}, ::AbstractChar) where T<:AbstractChar
@ Base char.jl:186
...
Stacktrace:
[1] push!(a::Vector{Char}, item::String)
@ Base ./array.jl:1060
[2] find_orfs(sequence::String)
@ Main ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:66
[3] top-level scope
@ ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:111
代码:
#=
The following code is proposed to complete the Rosalind activity "Open Reading Frames".
Given any DNA sequence, the algorithm should be capable to identify protein sequences that starts in an start codon and finishes at stop codon.
It should be achieved by navigating through the original and complementary sequences and accessing the 3 reading frames in each.
If the start codon is identified, the code should insert a 'M' into the sequence and consequently add the amino acids that corresponds to each codon.
Finally, the Protein vector of Strings should contain all possibilities of proteins.
=#
#Function to get the complementary strand given the DNA sequence
function reverse_complement(x::String)
comp = Vector{Char}()
for i in x
if i == 'A'
push!(comp, 'T')
elseif i == 'T'
push!(comp, 'A')
elseif i == 'C'
push!(comp, 'G')
elseif i == 'G'
push!(comp, 'C')
end
end
return join(reverse(comp))
end
function find_orfs(sequence::String)
#Creation of Codon dictionary
codon_table = Dict(
"TTT" => "F", "TTC" => "F",
"TTA" => "L", "TTG" => "L", "CTT" => "L", "CTC" => "L", "CTA" => "L", "CTG" => "L",
"ATT" => "I", "ATC" => "I", "ATA" => "I",
"ATG" => "M",
"GTT" => "V", "GTC" => "V", "GTA" => "V", "GTG" => "V",
"TCT" => "S", "TCC" => "S", "TCA" => "S", "TCG" => "S", "AGT" => "S", "AGC" => "S",
"CCT" => "P", "CCC" => "P", "CCA" => "P", "CCG" => "P",
"ACT" => "T", "ACC" => "T", "ACA" => "T", "ACG" => "T",
"GCT" => "A", "GCC" => "A", "GCA" => "A", "GCG" => "A",
"TAT" => "Y", "TAC" => "Y",
"TAA" => "STOP", "TAG" => "STOP", "TGA" => "STOP",
"CAT" => "H", "CAC" => "H",
"CAA" => "Q", "CAG" => "Q",
"AAT" => "N", "AAC" => "N",
"AAA" => "K", "AAG" => "K",
"GAT" => "D", "GAC" => "D",
"GAA" => "E", "GAG" => "E",
"TGT" => "C", "TGC" => "C",
"TGG" => "W",
"CGT" => "R", "CGC" => "R", "CGA" => "R", "CGG" => "R", "AGA" => "R", "AGG" => "R",
"GGT" => "G", "GGC" => "G", "GGA" => "G", "GGG" => "G"
)
Proteins = Vector{String}()
# Consider all three forward reading frames
frames = [1, 2, 3]
for j in frames
for i in j:3:length(sequence) - 2
orf = Vector{Char}() #Vetor
codon = sequence[i:i+2]
if haskey(codon_table, codon)
amino_acid = codon_table[codon]
if amino_acid == "M"
push!(orf, amino_acid)
continue
elseif amino_acid == "STOP"
push!(Proteins, orf)
break
else
push!(orf, amino_acid)
end
else
error("Invalid codon: $codon")
end
end
end
comp_seq = reverse_complement(sequence)
# Consider all three reverse reading frames
for j in frames
for i in j:3:length(comp_seq) - 2
orf = Vector{Char}()
codon = comp_seq[i:i+2]
if haskey(codon_table, codon)
amino_acid = codon_table[codon]
if amino_acid == "M"
push!(orf, amino_acid)
continue
elseif amino_acid == "STOP"
push!(Proteins, orf)
break
else
push!(orf, amino_acid)
end
else
error("Invalid codon: $codon")
end
end
end
return Proteins
end
sequence = "ATGGCCATGGCGCCCAGAACTGAGATCAATAGTACCCGTATAACGGGTGA"
result = find_orfs(sequence)
println(result)
我已经尝试过将“orf”类型更改为 String 或 AbstractString,但也不起作用。 甚至尝试调整代码以将每个氨基酸键存储在向量“orf”中,然后将其内容用作字符串以作为参数放入推送中!功能,但没有积极的结果。
如堆栈跟踪所示:
Stacktrace:
[1] push!(a::Vector{Char}, item::String)
@ Base ./array.jl:1060
[2] find_orfs(sequence::String)
@ Main ~/Documents/Codigos/RSLD_Open_Reading_Frames.jl:66
问题是您正在尝试将
push!
字符串 (item::String
) 转换为字符向量。在 Julia 中,双引号创建字符串,而单引号创建 Char
值。所以codon_table
的值一侧的氨基酸都是字符串,而不是字符。
由于您的目标是将蛋白质作为字符串数组,因此这里最简单的选择是将
orf
更改为 String
,并将氨基酸连接到它(正如 Dan 建议的那样)。因此,将 orf
初始化为 orf = ""
(一个空字符串),然后使用 push!(orf, amino_acid)
代替 orf *= amino_acid
行,它是 orf = orf * amino_acid
的简写(*
运算符将现有 orf
与新的 amino_acid
字符串)。
还要注意的一点是,由于它存在,此代码在每个氨基酸之后将
orf
重置为空,因此蛋白质字符串最终总是空的。设置 orf
的初始值应在 for i in j:3:length(comp_seq) - 2
循环外部完成,而不是在循环内部完成,以避免出现此问题。
(我相信这段代码在未正确处理密码子的开头方面也有一个逻辑错误,但我将把它留给你来修复,因为这是挑战的一部分。)