如何从python的fasta文件中提取标头

问题描述 投票:0回答:1

我有一个包含序列的fasta文件。我只想提取标题信息并显示它。

我是python编码的新手

python file-io fasta
1个回答
0
投票

# The with open will open the file using "f" as the file handle. with open("/home/rightmire/Downloads/fastafile", "r") as f: for line in f: # Creates a for loop to read the file line by line print(line) # This is the first line # If you comment out the break, the file will continue to be read line by line # If you want just the first line, you can break the loop break # even though the loop has ended, the last contents of the variable 'line' is remembered print("The data retained in the variable 'line' is: ", line)

输出:

>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase The data retained in the variable 'line' is: >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase

===
您也可以选择不使用循环或'with'。 

f = open("/home/rightmire/Downloads/fastafile", "r") line = f.readline() # reads one line print(line) f.close() # Closes the open file.

===

最后,您可以将整个文件读入内存,在这里您可以将整个文件作为一个整体进行操作,可以对各个行进行操作,甚至可以逐个字符地解析文件。但是,这可能不是最好的主意,因为文件可能很大!

# The with open will open the file using "f" as the file handle. f = open("/home/rightmire/Downloads/fastafile", "r") # Read the entire file into the variable 'lines' lines = f.read() # Split 'lines' by the newline character to get individual lines. for line in lines.split("\n"): print("--------") print(line) # or even read it out character by character, which can be handy for parsing the genome data. for c in lines: print(c)

输出:

-------- >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase -------- MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA -------- AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ -------- QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ -------- LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK -------- > g i | 1 (snip) M N S E (snip)

© www.soinside.com 2019 - 2024. All rights reserved.