我想将程序ABRICATE(github.com/tseemann/abricate)的多个csv文件结果串联到单个csv文件中。
以下snakefile通过fasta文件的文件夹运行,从而为每个fasta生成一个单独的csv。我想将这些结果串联到单个csv文件中,但不确定如何构建“全部规则”。
有什么建议吗?下面是我的工作蛇文件。谢谢
########################### snakefile making dictionary
import os
import yaml
file_list = []
### location assumes that data is in results_fasta/ folder
for entry in os.scandir("results_fasta/"):
if entry.is_file():
file_list.append(entry.name)
#### this tells split the file for use in dictionary creation
config_dict = {"samples":{i.split(".")[0]:"results_fasta/"+i for i in file_list}}
with open("config_abricate_resfinder.yaml","w") as handle:
yaml.dump(config_dict,handle)
###### dictionary created using all files located in results_fasta folder
configfile: "config_abricate_resfinder.yaml"
##### not needed but used to say what is currently running
print("Starting abricate workflow")
##### rule all is a general rule that says this is the results we are lookin for in the end.
rule all:
input:
expand("abricate_resfinder/{sample}_abricate_resfinder.csv", sample = config["samples"])
##### Abricate resfinder
rule resfinder:
input:
lambda wildcards: config["samples"][wildcards.sample]
params:
db_resfinder = "leaveblank", ### for resfinder DB is default so leave blank
type = "csv"
output:
res = "abricate_resfinder/{sample}_abricate_resfinder.csv",
######## log file is empty. need to print stdout to log
log:
"logs/{sample}_resfinder.log"
shell:
"abricate {input} --{params.type} > {output.res}"
如果文件没有标题并且字段匹配,则可以通过cat
命令进行串联:
rule all:
input:
expand("abricate_resfinder/{sample}_abricate_resfinder.csv", sample = config["samples"])
output:
out.csv
shell:
"cat {input} > {output}"
如果有标题,则需要首先考虑实际需要的结果。最简单的方法是完全删除所有标头。您可以尝试以下方法:
"cat {input} | sed '/^#/d' > {output}"