管理 Snakemake 中的包装器

问题描述 投票:0回答:1

我正在尝试使用配置文件 config.yaml 制作 Snakemake 管道。

--- 
samples:
- Sample_1: reads/raw_reads_1
- Sample_2: reads/raw_reads_2
- Sample_3: reads/raw_reads_3
 
reference:
- "path/to/reference.fasta" 
...

遵循 Snakemake 包装器存储库中给出的良好实践,我尽可能多地使用包装器命令。 https://snakemake-wrappers.readthedocs.io/en/stable/ 但是我发现不同包装器的输入/输出不匹配。 有没有办法管理规则之间的文件名称以连接包装器而不修改它们?

import os
import snakemake.io
import glob

configfile: "config.yaml"

rule all:
        input:
                expand("mapped/{sample}.bam", sample=config["samples"])

rule fastqc:
        input:
                "reads/{sample}.fastq"
        output:
                html="qc/fastqc/{sample}.html",
                zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
        params:
                extra = "--quiet"
        log:
                "logs/fastqc/{sample}.log"
        threads: 1
        resources:
                mem_mb = 1024
        wrapper:
                "v2.6.0/bio/fastqc"

rule bwa_index:
        input:
                "{genome}.fasta", 
        output:
                idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
        log:
                "logs/bwa_index/{genome}.log",
        params:
                algorithm="bwtsw",
        wrapper:
                "v2.6.0/bio/bwa/index"

rule bwa_mem:
        input:
                reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
                idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
        output:
                "mapped/{sample}.bam",
        log:
                "logs/bwa_mem/{sample}.log",
        params:
                extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
                sorting="none",  # Can be 'none', 'samtools' or 'picard'.
                sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
                sort_extra="",  # Extra args for samtools/picard.
                threads: 8
        wrapper:
                "v2.6.0/bio/bwa/mem"
wrapper snakemake
1个回答
0
投票

包装器的目的只是包装如何做某事的逻辑。没有什么神奇的方法可以自动将规则的输入连接到输出。您的工作是根据您的目的定义输入和输出值。包装器中的值不匹配,因为它们只是为了测试目的而设置的。

这就是全部答案,但是我会尝试提供一些示例,但您的示例配置不正确。

def get_sample_names():
    # return list of sample names

rule all:
    input:
        bamqc=expand("mapped/{sample}.bam", sample=get_sample_names()),
        fastqc=expand("qc/fastqc/{sample}_{pair}.html", sample=get_sample_names(), pair=["1","2"]),

rule bwa_index:
    input:
        "{prefix_reference}.fasta", 
    output:
        idx=multiext("{prefix_reference}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    log:
        "{prefix_reference}/logs/bwa_index.log",
    params:
        algorithm="bwtsw",
    wrapper:
        "v2.6.0/bio/bwa/index"


rule bwa_mem:
    input:
        reads=[
            "reads/{sample}.1.fastq.gz",
            "reads/{sample}.2.fastq.gz",
        ],
        idx=multiext(
            os.path.splitext(config["reference"]),
            ".amb",
            ".ann",
            ".bwt",
            ".pac",
            ".sa",
        ),
    output:
        "mapped/{sample}.bam",
    ...


rule fastqc:
    input:
        "reads/{sample}.{pair}.fastq"
    output:
        html="qc/fastqc/{sample}_{pair}.html",
        zip="qc/fastqc/{sample}_{pair}_fastqc.zip"
    params:
        extra = "--quiet"
    log:
        "logs/fastqc/{sample}_{pair}.log"
    wrapper:
        "v2.6.0/bio/fastqc"

在示例中,我假设您的所有数据都在

reads/
中。示例名称可以由配置提供,也可以在
reads
目录中查找所有示例。另一种更复杂、更高级的管理输入读取的方法是使用 pepfiles。 我建议进一步检查现有工作流程,例如通过检查catalog

中的最新工作流程
© www.soinside.com 2019 - 2024. All rights reserved.