我一直在尝试解决Rosalind.info网站上的生物信息学问题,现在当我想执行一些简单的测试时遇到了一些麻烦。
我的项目的结构如下:
Rosalind-problems/
├─ bioinformatics_stronghold/
│ ├─ data/
│ ├─ modules/
│ │ ├─ __init__.py
│ │ ├─ read_fasta.py
│ ├─ CONS.py
│ ├─ IEV.py
├─ tests/
│ ├─ __init__.py
│ ├─ test_CONS.py
│ ├─ test_IEV.py
这里的目标是能够测试生物信息学据点文件夹中的所有单个文件(CONS.py、IEV.py 等)。然而我遇到的问题是:
查看下面所有受影响的文件:
test_IEV.py
import pytest
from bioinformatics_stronghold.IEV import calculate_offspring
def test_calculate_offspring():
assert calculate_offspring([1, 0, 0, 1, 0, 1]) == 3.5
assert calculate_offspring([1, 1, 1, 1, 1, 1]) == 8.5
IEV.py
def calculate_offspring(input_list:list[int]) -> float:
"""This function will take an input list of non-negative integers no larger than 20,000. The function will then calculate the expected offspring showing the dominant phenotype.
Args:
input_list (list): Input a list of integers representing the number of couples
Returns:
float: The expected number of offspring
"""
input_list = input_list
expected_dominant_offspring = 0
# For all cases, it is assumed that all couples will have exactly 2 calculate_offspring
for index, count in enumerate(input_list):
print("Index:", index, " ", "Num couples:", count)
# Case AA-AA, all offspring will be dominant phenotye
if index == 0:
expected_dominant_offspring += count * 2 * 1
# Case AA-Aa, all offspring will be dominant phenotype
elif index == 1:
expected_dominant_offspring += count * 2 * 1
# Case AA-aa, all offspring will be dominant phenotype
elif index == 2:
expected_dominant_offspring += count * 2 * 1
# Case Aa-Aa, 3 out of 4 offspring will be dominant genotype
elif index == 3:
expected_dominant_offspring += count * (2 * (3/4))
# Case Aa-aa, 1 out of 4 offspring will be dominant phenotype
elif index == 4:
expected_dominant_offspring += count * (2 * (2/4))
# Case aa-aa, no offspring will be dominant phenotype
elif index == 5:
expected_dominant_offspring += count * 2 * 0
print(expected_dominant_offspring)
return expected_dominant_offspring
这两个效果很好。
现在处理有问题的文件...
test_CONS.py
import pytest
from bioinformatics_stronghold.CONS import find_consensus_sequence
def test_find_consensus_sequence():
assert find_consensus_sequence("tests\\data\\CONS_sample_data.fasta") == [[5, 1, 0, 0, 5, 5, 0, 0], [0, 0, 1, 4, 2, 0, 6, 1], [1, 1, 6, 3, 0, 1, 0, 0], [1, 5, 0, 0, 0, 1, 1, 6]], ['A', 'T', 'G', 'C', 'A', 'A', 'C', 'T']
添加行
from bioinformatics_stronghold.modules.read_fasta import read_fasta_file
只会给我一个导入错误 ModuleNotFound。添加 .或 .. 来自 ImportError 的结果:尝试在没有已知父包的情况下进行相对导入。
缺点.py
from modules.read_fasta import read_fasta_file
def find_consensus_sequence(fasta_location):
"""
This function will read a given fasta file and extract all sequences using the read_fasta.py module.
The function will then create a profile matrix as well as a consensus sequence, both as lists.
Args:
fasta_location (str): The location of the fasta file as a string.
Returns:
profile_matrix (list[lists]): The profile matrix of all given sequences.
consensus_sequence (list): The consensus sequences of all given sequences.
"""
fasta_content = read_fasta_file(fasta_location, debug=False)
# Create a matrix with all sequences
sequence_matrix = []
for item in fasta_content:
sequence_matrix.append(list(item.sequence))
# print(sequence_matrix)
# Create the empty profile matrix
# [A, C, G, T]
profile_matrix = [[0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0])]
# print(profile_matrix)
# Add to the nucleotide count depending on the sequence
for index, sublist in enumerate(sequence_matrix):
for index, nucleotide in enumerate(sublist):
if nucleotide == "A":
profile_matrix[0][index] += 1
if nucleotide == "C":
profile_matrix[1][index] += 1
if nucleotide == "G":
profile_matrix[2][index] += 1
if nucleotide == "T":
profile_matrix[3][index] += 1
# print(profile_matrix)
consensus_sequence = []
# NOTE: Ugly solution, but it seems to work. Quite ineffective, but not sure how to improve at this time.
# For each position in the sequence, check which "letter" is larger than all other
for index in range(len(profile_matrix[0])):
if profile_matrix[0][index] > profile_matrix[1][index] and profile_matrix[0][index] > profile_matrix[2][index] and profile_matrix[0][index] > profile_matrix[3][index]:
consensus_sequence.append("A")
elif profile_matrix[1][index] > profile_matrix[0][index] and profile_matrix[1][index] > profile_matrix[2][index] and profile_matrix[1][index] > profile_matrix[3][index]:
consensus_sequence.append("C")
elif profile_matrix[2][index] > profile_matrix[0][index] and profile_matrix[2][index] > profile_matrix[1][index] and profile_matrix[2][index] > profile_matrix[3][index]:
consensus_sequence.append("G")
elif profile_matrix[3][index] > profile_matrix[0][index] and profile_matrix[3][index] > profile_matrix[1][index] and profile_matrix[3][index] > profile_matrix[2][index]:
consensus_sequence.append("T")
# print(consensus_sequence)
return profile_matrix, consensus_sequence
test_CONS.py 不起作用。问题似乎是找不到modules文件夹。
将 __init__.py 添加到 bioinformatics_stronghold 文件夹并不能解决此问题。
如果我将测试文件夹移动到 bioinformatics_stronghold 文件夹中,pytest 就会中断,没有明显的错误消息,并且我无法在 VSCodium 中设置测试。
我的问题是:
我认为改变这个应该可以做到:
from .modules.read_fasta import read_fasta_file
如果这不起作用,
read_fasta.py
中可能存在某种导入问题,我鼓励您在此处评论您所看到的完整错误回溯,而不仅仅是错误消息。
注意:您的命名约定不遵循 PEP8 准则。
模块应该有简短的、全小写的名称。如果可以提高可读性,可以在模块名称中使用下划线。
编辑: 这是一个有关如何构建项目并使其可调用的示例。
Rosalind-problems/
├─ bioinformatics_stronghold/
│ ├─ data/
│ ├─ modules/
│ │ ├─ __init__.py
│ │ ├─ read_fasta.py
│ ├─ __main__.py
│ ├─ CONS.py
│ ├─ IEV.py
├─ tests/
│ ├─ __init__.py
│ ├─ test_CONS.py
│ ├─ test_IEV.py
__main__.py
from .modules import read_fasta
read_fasta.call_a_function()
要执行此操作,只需在终端中输入
python -m bioinformatics_stronghold
即可。使用单个主入口点,您可以执行各种操作,例如接受用户输入、添加 argparse
界面等。