Snakemake文件在多个目录中

问题描述 投票:1回答:1

我的{dir}是变量= / nameX / tissueX / trimmed这是我使用的代码:

HISAT2_INDEX_PREFIX = "/index/genome_chromosomes"

directories, SAMPLES=glob_wildcards('/test/{dir}/{sample}_1.fastq.gz')
rule all:
    input: 
        expand("{dir}/{sample}.bam", zip, dir=directories, sample=SAMPLES)
rule hisat2:
    input:
        hisat2_index=expand("%s.{ix}.ht2l" % HISAT2_INDEX_PREFIX, ix=range(1, 9)),
        fastq1="/test/{dir}/{sample}_1.fastq.gz",
        fastq2="/test/{dir}/{sample}_1.fastq.gz"
    output:
        bam = "{dir}/{sample}.bam",
        txt = "{dir}/{sample}.txt",
    log: "{dir}/{sample}.snakemake_log.txt"
    threads: 2
    shell:
        "hisat2 -p {threads} -x {HISAT2_INDEX_PREFIX}"
        " -1 {input.fastq1} -2 {input.fastq2}  --summary-file {output.txt} |"
        "samtools sort -@ {threads} -o {output.bam}"

如何修改以在每个bam文件中添加nameX前缀,并将所有bam文件保存在同一目录中?并为相同的nameX创建一个bam文件?

snakemake
1个回答
1
投票

这不是最漂亮的,但这是需要完成的方式:

import random
import glob
from pathlib import Path


SAMPLES = ['dummy', 'dommy']
rule all:
    input:
        [f"do_all_{sample}.out" for sample in SAMPLES]


def aggregate(wildcards):
    checkpoints.fastq_splitter.get(sample=wildcards.sample)
    read_groups = glob_wildcards(f"{wildcards.sample}_{{read_group}}.fastq.gz").read_group
    return [f"bam/{wildcards.sample}_{read_group}.bam" for read_group in read_groups]


rule do_everything:
    input:
        aggregate
    output:
        touch("do_all_{sample}.out")


rule do_sth_splitted:
    input:
        "{sample}_{read_group}.fastq.gz"
    output:
        touch("bam/{sample}_{read_group}.bam")



checkpoint fastq_splitter:
    input:
        "{sample}.fastq.gz"
    output:
        touch("{sample}.done")
    run:
        for i in range(random.randint(1, 5)):
            Path(f'{wildcards.sample}_{i}.fastq.gz').touch()

运行之前,请确保示例文件存在:touch d{u,o}mmy.fastq.gz

checkpoint fastq_splitter中,我们生成随机数量的“ fastq”文件。我们假装将rule do_sth_splitted与基因组对齐,并为每个阅读组得到一个bam文件。 rule do_everything在那里检查checkpoint fastq_splitter的输出,并且仅在完成[ fastq_splitter时才进行评估。 rule all可以确保所有样品都运行正常。

看看checkpoints。以获得更适当的解释。
© www.soinside.com 2019 - 2024. All rights reserved.