Snakemake正在运行Subworkflow,但不是其余的工作流程(直接进入规则All)

问题描述 投票:1回答:1

我是Snakemake和StackOverflow上的新手。如果有不清楚的地方或想要其他细节,请随时告诉我。我编写了一个工作流,该工作流允许将.BCL Illumina基本调用文件转换为多路分解的.FASTQ文件并生成QC报告(FastQC文件)。该工作流程由:

  • Subworkflow“ convert_bcl_to_fastq”它从BCL文件在名为Fastq的目录中创建FASTQ文件。它必须在主工作流程之前执行,这就是为什么我选择使用子工作流程的原因,因为我的第二条规则取决于这些FASTQ文件的生成,而我事先并不知道它们的名称。将创建一个伪文件“ convert_bcl_to_fastq.done”作为输出,以便知道此子工作流何时按预期运行。
  • Rule“ generate_fastqc”由于子工作流,它生成了生成的FASTQ文件,并在名为FastQC的目录中创建了FASTQC文件。

问题

当我尝试运行我的工作流时,我没有任何错误,但是我的工作流行为不正常。我只得到要运行的子工作流,然后得到主要工作流,但是仅执行规则“全部”。我的规则“ generate_fastqc”根本没有执行。我想知道我可能在哪里错了?这就是我得到的:

Building DAG of jobs...
Executing subworkflow convert_bcl_to_fastq.
Building DAG of jobs...
Job counts:
        count   jobs
        1       convert_bcl_to_fastq
        1
[...]
Processing completed with 0 errors and 1 warnings.
Touching output file convert_bcl_to_fastq.done.
Finished job 0.
1 of 1 steps (100%) done
Complete log: /path/to/my/working/directory/conversion/.snakemake/log/2020-03-12T171952.799414.snakemake.log
Executing main workflow.
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1

localrule all:
    input: /path/to/my/working/directory/conversion/convert_bcl_to_fastq.done
    jobid: 0

Finished job 0.
1 of 1 steps (100%) done

并且当我所有的FASTQ文件生成后,如果再次运行我的工作流,这一次它将执行规则“ generate_fastqc”]

Building DAG of jobs...
Executing subworkflow convert_bcl_to_fastq.
Building DAG of jobs...
Nothing to be done.
Complete log: /path/to/my/working/directory/conversion/.snakemake/log/2020-03-12T174337.605716.snakemake.log
Executing main workflow.
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        95      generate_fastqc
        96

我希望我的工作流在子工作流执行完成后立即通过运行规则“ generate_fastqc”完全执行自身,但是我实际上被迫执行我的工作流2次。我认为此工作流程将正常工作,因为子工作流程将生成工作流程第二部分所需的所有文件... 您是否知道我可能在哪里错了?


我的代码

这是我的主要工作流程的Snakefile:

subworkflow convert_bcl_to_fastq:
    workdir: WDIR + "conversion/"
    snakefile: WDIR + "conversion/Snakefile"

SAMPLES, = glob_wildcards(FASTQ_DIR + "{sample}_R1_001.fastq.gz")

rule all:
    input:
        convert_bcl_to_fastq("convert_bcl_to_fastq.done"),
        expand(FASTQC_DIR + "{sample}_R1_001_fastqc.html", sample=SAMPLES),
        expand(FASTQC_DIR + "{sample}_R2_001_fastqc.html", sample=SAMPLES)

rule generate_fastqc:
    output:
        FASTQC_DIR + "{sample}_R1_001_fastqc.html",
        FASTQC_DIR + "{sample}_R2_001_fastqc.html",
        temp(FASTQC_DIR + "{sample}_R1_001_fastqc.zip"),
        temp(FASTQC_DIR + "{sample}_R2_001_fastqc.zip")
    shell:
        "mkdir -p "+ FASTQC_DIR +" | " #Creates a FastQC directory if it is missing
        "fastqc --outdir "+ FASTQC_DIR +" "+ FASTQ_DIR +"{wildcards.sample}_R1_001.fastq.gz "+ FASTQ_DIR + " {wildcards.sample}_R2_001.fastq.gz &" #Generates FASTQC files for each sample at a time

这是我的子工作流程“ convert_bcl_to_fastq”的Snakefile:

rule all:
    input:
        "convert_bcl_to_fastq.done"

rule convert_bcl_to_fastq:
    output:
        touch("convert_bcl_to_fastq.done")
    shell:
        "mkdir -p "+ FASTQ_DIR +" | " #Creates a Fastq directory if it is missing
        "bcl2fastq --no-lane-splitting --runfolder-dir "+ INPUT_DIR +" --output-dir "+ FASTQ_DIR #Demultiplexes and Converts BCL files to FASTQ files

谢谢您的帮助!

snakemake
1个回答
0
投票

[documentation关于subworkflow的当前状态:

When executing, snakemake first tries to create (or update, if necessary) 
"test.txt" (and all other possibly mentioned dependencies) by executing the subworkflow. 
Then the current workflow is executed.

在您的情况下,唯一声明的依赖项是“ convert_bcl_to_fastq.done”,Snakemake会很高兴第一次生成它。

Snakemake通常进行一次解析,并且未通知主工作流程从子工作流程中查找样本文件。由于在第一次执行期间尚不存在样本文件,因此主工作流程在expand()语句中不匹配。没有匹配项,没有工作要做:-)

第二次运行主工作流程时,它将在expand()rule all:中找到样本匹配项,并产生它们。

旁注1:很高兴注意到这一点。使用您的代码,如果您实际上进行了强制重新运行子工作流的更改,Snakemake将找到旧的“ convert_bcl_to_fastq.done”,而不重新执行子工作流。

旁注2:如果您想使Snakemake少“一次通过”,它就有一个规则关键字checkpoint,可用于重新评估作为规则执行结果需要做什么。在您的情况下,检查点应该是rule convert_bcl_to_fastq。这将要求规则位于同一逻辑蛇文件中(尽管include允许多个文件)

© www.soinside.com 2019 - 2024. All rights reserved.