我有四个独立的FASTA文件,我想把它们合并成一个大的FASTA文件。到目前为止,我已经使用Biostrings软件包分别读取每个文件。
例如如果你的fasta文件是。
folder = "http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/chromosomes/"
files = paste0(folder,c("chrI","chrII","chrIII","chrIV"),".fa.gz")
files
[1] "http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/chromosomes/chrI.fa.gz"
[2] "http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/chromosomes/chrII.fa.gz"
[3] "http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/chromosomes/chrIII.fa.gz"
[4] "http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/chromosomes/chrIV.fa.gz"
而我们可以做。
library(Biostrings)
fa_seq = lapply(files,readDNAStringSet)
fa_seq = do.call(c,fa_seq)
fa_seq
A DNAStringSet instance of length 4
width seq names
[1] 230218 CCACACCACACCCACACACCCA...GTGTGGGTGTGGTGTGTGTGGG chrI
[2] 813184 AAATAGCCCTCATGTACGTCTC...TGGGTGTGGTGTGTGGGTGTGT chrII
[3] 316620 CCCACACACCACACCCACACCA...TGTGGTGGGTGTGGTGTGTGTG chrIII
[4] 1531933 ACACCACACCCACACCACACCC...AAAGGTAGTAAGTAGCTTTTGG chrIV