我正在通过snakemake运行DASTool,由于某种原因,尽管我得到了出纸槽,但以下错误使我们无法正常工作。自从有了输出后,这是一个小麻烦,但此后它立即终止了我的snakemake运行。蛇文件看起来像这样:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"binning.yml"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {output.daout} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch das_tool.done
date
错误显示为:
Waiting at most 120 seconds for missing files.
MissingOutputException in line 277 of /mnt/lscratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/metaspades_binning_snakefile:
Job completed successfully, but some output files are missing. Missing files after 120 seconds:
/scratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/bwa_sr_metaspades/dastool_output/metaspades
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
终止工作可能还会丢失其他哪些文件?我已经尝试了--latency-wait选项长达900秒,但还没有走运。
谢谢!
编辑:基于Gajapathy的评论,我已经编辑了文件,看起来像这样:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"/home/users/sbusi/apps/environments/base.yml"
params:
basename="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_DASTool_bins"),
dafile="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_proteins.faa",
damfile=touch("{datadir}/{mapper}_{reads}_{sample}_das_tool.done")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {params.basename} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch {output.damfile}
date
"""
虽然仍然会丢失文件错误。
按照DAS_Tool's doc,-o
定义输出文件的基本名称;没有输出文件夹。
-o, --outputbasename Basename of output files.
所以通用的简化规则看起来像
rule DAS_Tool:
output: 'path/to/outdir/basename_proteins.faa`
params: basename = 'path/to/outdir/basename'
shell: "DAS_Tool .... -o {params.basename} ...."
如果不想在params
中对基名进行硬编码,则可以使用python的lambda魔术从param的输出文件中获取它。