如何从python的fasta文件中提取标头

Question

我有一个包含序列的fasta文件。我只想提取标题信息并显示它。

我是python编码的新手

Answer 1

即

# The with open will open the file using "f" as the file handle. with open("/home/rightmire/Downloads/fastafile", "r") as f: for line in f: # Creates a for loop to read the file line by line print(line) # This is the first line # If you comment out the break, the file will continue to be read line by line # If you want just the first line, you can break the loop break # even though the loop has ended, the last contents of the variable 'line' is remembered print("The data retained in the variable 'line' is: ", line)

输出：>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase

 The data retained in the variable 'line' is:  >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase

===
您也可以选择不使用循环或'with'。
f = open("/home/rightmire/Downloads/fastafile", "r") line = f.readline() # reads one line print(line) f.close() # Closes the open file.
===
最后，您可以将整个文件读入内存，在这里您可以将整个文件作为一个整体进行操作，可以对各个行进行操作，甚至可以逐个字符地解析文件。但是，这可能不是最好的主意，因为文件可能很大！
# The with open will open the file using "f" as the file handle. f = open("/home/rightmire/Downloads/fastafile", "r") # Read the entire file into the variable 'lines' lines = f.read() # Split 'lines' by the newline character to get individual lines. for line in lines.split("\n"): print("--------") print(line) # or even read it out character by character, which can be handy for parsing the genome data. for c in lines: print(c)
输出：

-------- >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase -------- MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA -------- AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ -------- QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ -------- LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK -------- > g i | 1 (snip) M N S E (snip)

如何从python的fasta文件中提取标头

问题描述投票：0回答：1

1个回答

最新问题

如何从python的fasta文件中提取标头

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1