检查点规则中缺少通配符 - Snakemake

问题描述 投票:0回答:1

我有一个检查点,导致未知数量的文件,需要在执行后重新评估 DAG。我想通过染色体并行化命令来加速它。因此,在重新评估后,我会通过实验合并所有染色体。然而,snakemake 无法推断染色体。

checkpoint GenomeAnalysisTK:
    input:
        bamlist = rules.RealignerTargetCreator.output.bamlist,
        intervals = rules.RealignerTargetCreator.output.intervals,
        fasta = fasta
    output:
        temp(directory("splits/{chromosome}"))
    conda:
        "gatk3"
    wildcard_constraints:
            chromosome='|'.join([x for x in detect_chromosomes(fai)]),
    shell:
        """
        mkdir -p {output} && cd {output}
        gatk3 -Xmx24g -T IndelRealigner -I {input.bamlist} -targetIntervals {input.intervals} -L {wildcards.chromosome} -R {input.fasta} -compress 0 --nWayOut .{wildcards.chromosome}.indelrealigned.bam 
        """


def agg(wildcards):
    output=checkpoints.GenomeAnalysisTK.get(**wildcards).output[0]
    return expand("splits/{chromosome}/{{experiment}}/{chromosome}.indelrealigned.bam")

rule merge_realigned:
    input:
        agg
    output:
        "{patient}/{sample}/{experiment}.merged.indelrealigned.bam"
    threads:
        config["other_threads"],
    params:
        compression_level = 0
    wildcard_constraints:
            chromosome='|'.join([x for x in detect_chromosomes(fai)]),
    shell:
        "samtools merge -@ {threads} -l {params.compression_level} {output} {input}"

但是,我得到了典型的“工作流程错误:缺少染色体的通配符值”。我怎样才能让它推断染色体?

workflow snakemake
1个回答
0
投票

问题是

merge_realigned
规则没有用于匹配染色体的通配符,因此您必须在输入函数中指定它。然而,你的规则取决于所有染色体,所以你必须首先获得所有染色体的输出:

def agg(wildcards):
    for chrom in CHROMSOME_LIST:
        checkpoints.GenomeAnalysisTK.get(chromosome=chrom, **wildcards).output
    return expand("splits/{chromosome}/{{experiment}}/{chromosome}.indelrealigned.bam",
       chromosome=CHROMOSOME_LIST)

并且您还必须在扩展语句中指定染色体。

如果第一个检查点必须在请求第二个检查点之前完成,则 for 循环构造可能会阻止并行执行,我不确定情况是否如此。

© www.soinside.com 2019 - 2024. All rights reserved.