snakemake在作业完成后丢失输出异常

问题描述 投票:0回答:1

我正在通过snakemake运行DASTool,由于某种原因,尽管我得到了出纸槽,但以下错误使我们无法正常工作。自从有了输出后,这是一个小麻烦,但此后它立即终止了我的snakemake运行。蛇文件看起来像这样:

rule DAS_Tool:
    input:
            da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
            da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
            da3="{datadir}/{sample}.fna",
            db=config["dastool_database"]
    threads:config["threads"]
    conda:"binning.yml"
    output:
            daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}")
    shell:
            """
            date
            DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {output.daout} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
            2> >(tee {log}.stderr) > >(tee {log}.stdout)
            touch das_tool.done
            date

错误显示为:

Waiting at most 120 seconds for missing files.
MissingOutputException in line 277 of /mnt/lscratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/metaspades_binning_snakefile:
Job completed successfully, but some output files are missing. Missing files after 120 seconds:
/scratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/bwa_sr_metaspades/dastool_output/metaspades
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

终止工作可能还会丢失其他哪些文件?我已经尝试了--latency-wait选项长​​达900秒,但还没有走运。

谢谢!

编辑:基于Gajapathy的评论,我已经编辑了文件,看起来像这样:

rule DAS_Tool:
    input:
            da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
            da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
            da3="{datadir}/{sample}.fna",
            db=config["dastool_database"]
    threads:config["threads"]
    conda:"/home/users/sbusi/apps/environments/base.yml"
    params:
            basename="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}"
    output:
            daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_DASTool_bins"),
            dafile="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_proteins.faa",
            damfile=touch("{datadir}/{mapper}_{reads}_{sample}_das_tool.done")
    shell:
            """
            date
            DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {params.basename} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
            2> >(tee {log}.stderr) > >(tee {log}.stdout)
            touch {output.damfile}
            date
            """

虽然仍然会丢失文件错误。

wildcard missing-data snakemake
1个回答
0
投票

按照DAS_Tool's doc-o定义输出文件的基本名称;没有输出文件夹。

   -o, --outputbasename       Basename of output files.

所以通用的简化规则看起来像

rule DAS_Tool:
    output: 'path/to/outdir/basename_proteins.faa`
    params: basename = 'path/to/outdir/basename'
    shell: "DAS_Tool .... -o {params.basename} ...."

如果不想在params中对基名进行硬编码,则可以使用python的lambda魔术从param的输出文件中获取它。

© www.soinside.com 2019 - 2024. All rights reserved.