Snakemake通配符语法错误

问题描述 投票:0回答:1

我知道这是一个常见错误,我已经检查了其他帖子,但没有解决我的问题。我想使用与SortMeRNA相同的数据库名称来使用rRNAdb=config["rRNA_database"]规则(version=config["genome_version"])。但是显然,我不能。

SyntaxError:
Not all output, log and benchmark files of rule SortMeRNA contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file

......

import glob
import os

configfile : "config.json"


#################### GLOBAL VARIABLES #######################

OUTDIR = os.path.abspath(config["outdir"])


# ID NGS
idNGS = config["idNGS"]

# Fichiers FASTQ
DIR_FASTQ = config["dir_fastq"]

## PARAMETERS TRIMMOMATIC
w = config["w"]
Q = config["Q"]
m = config["m"]

## Bowtie2 database
BWT2_DB = config["BWT2_DB"]

## Templates multiQC
DIR_TPL = config["DIR_TPL"]


#### VERSIONS genomes and database ####
version = config["genome_version"]
rRNAdb = config["rRNA_database"]


####################### FASTQ FILES #########################

def list_samples(DIR_FASTQ): # create list with all sample names from fastq directory
    SAMPLES=[]
    for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
        base=os.path.basename(file)
        sample=(base.replace('.fastq.gz', ''))
        SAMPLES.append(sample)
    return(SAMPLES)

SAMPLES = list_samples(DIR_FASTQ)


########################## RULES ############################

rule all:
    input:
        OUTDIR+"/multiQC/multiQC-PL09_"+idNGS+".html",
        OUTDIR+"/multiQC/multiQC-PL21_"+idNGS+".html"


rule fastQC:
    input: 
        DIR_FASTQ+"/{sample}.fastq.gz"
    output: 
        "{OUTDIR}/FastQC/{sample}_fastqc.zip",
        "{OUTDIR}/FastQC/{sample}_fastqc.html"
    threads:
        16
    shell:
        """
        fastqc {input} -o {OUTDIR}/FastQC/
        """ 

rule trimmomatic:
    input:
        DIR_FASTQ+"/{sample}.fastq.gz"
    output:
        trim_out="{OUTDIR}/Trimmomatic/{sample}_SL-{w}-{Q}_Min-{m}.fastq.gz",
        trim1_out="{OUTDIR}/Trimmomatic/{sample}_SL-{w}-{Q}_Min-{m}.stdout",
        trim1_err="{OUTDIR}/Trimmomatic/{sample}_SL-{w}-{Q}_Min-{m}.stderr"
    threads:
        16
    shell:
        "trimmomatic SE -phred33 {input} {output.trim_out} SLIDINGWINDOW:{w}:{Q} MINLEN:{m} > {output.trim1_out} 2> {output.trim1_err}"


rule Bowtie2:
    input:
        fasta_trim="{OUTDIR}/Trimmomatic/{sample}_SL-"+str(w)+"-"+str(Q)+"_Min-"+str(m)+".fastq.gz"
    output:
        BWT2_ERR="{OUTDIR}/Bowtie2/{sample}_SL-{w}-{Q}_Min-{m}_bowtie2_{version}.stderr",
        BWT2_SAM=temp("{OUTDIR}/Bowtie2/{sample}_SL-{w}-{Q}_Min-{m}_bowtie2_{version}.sam"),
        BWT2_BAM=temp("{OUTDIR}/Bowtie2/{sample}_SL-{w}-{Q}_Min-{m}_bowtie2_{version}.bam"),
        BWT2_BAM_SORTED="{OUTDIR}/Bowtie2/{sample}_SL-{w}-{Q}_Min-{m}_bowtie2_{version}.sorted.bam"
    threads:
        16
    shell:
        """
        bowtie2 -x {BWT2_DB} -U {input} 2> {output.BWT2_ERR} > {output.BWT2_SAM}
        samtools view -bS -o {output.BWT2_BAM} {output.BWT2_SAM}
        samtools sort {output.BWT2_BAM} -o {output.BWT2_BAM_SORTED}
        samtools index {output.BWT2_BAM_SORTED}
        """


rule copy_to_pigz: # we just copy fastq files to gunzip them and use them in sortMeRNA, this way we don't touch the original ones
    input:
        DIR_FASTQ+"/{sample}.fastq.gz"
    output:
        "{OUTDIR}/temp/{sample}.fastq.gz" # this temp directory will be deleted at the end of the pipeline in multiQC rule
    shell:
        "cp {input} {output}"


rule SortMeRNA:
    input:
        "OUTDIR/temp/{sample}.fastq.gz"
    output:
        fasta_pigz=temp("{OUTDIR}/temp/{sample}.fastq"),
        blast="{OUTDIR}/SortMeRNA/{sample}.fastq_SortMeRNA_{rRNAdb}.blast",
        log="{OUTDIR}/SortMeRNA/{sample}.fastq_SortMeRNA_{rRNAdb}.log"
    params:
        prefixSortMeRNA="{OUTDIR}/SortMeRNA/{sample}.fastq_SortMeRNA_{rRNAdb}",
        sortmerna_DB=config["SORTMERNA_DB"]
    shell:
        """
        pigz -d -k {input}
        sortmerna --ref {params.sortmerna_DB} --reads {output.fasta_pigz} --aligned {params.prefixSortMeRNA} --blast '1' --log TRUE
        """


rule multiQC:
    input: 
        expand(OUTDIR+"/FastQC/{sample}_fastqc.zip", sample=SAMPLES),
        expand(OUTDIR+"/Trimmomatic/{sample}_SL-"+str(w)+"-"+str(Q)+"_Min-"+str(m)+".fastq.gz", sample=SAMPLES),
        expand(OUTDIR+"/Trimmomatic/{sample}_SL-"+str(w)+"-"+str(Q)+"_Min-"+str(m)+".stdout", sample=SAMPLES),
        expand(OUTDIR+"/Trimmomatic/{sample}_SL-"+str(w)+"-"+str(Q)+"_Min-"+str(m)+".stderr", sample=SAMPLES),
        expand(OUTDIR+"/Bowtie2/{sample}_SL-"+str(w)+"-"+str(Q)+"_Min-"+str(m)+"_bowtie2_"+version+".stderr", sample=SAMPLES),
        expand(OUTDIR+"/SortMeRNA/{sample}.fastq_SortMeRNA_"+rRNAdb+".blast", sample=SAMPLES),
        expand(OUTDIR+"/SortMeRNA/{sample}.fastq_SortMeRNA_"+rRNAdb+".log", sample=SAMPLES),
        tpl_plNGS=DIR_TPL+"/template-PL-NGS_multiqc_config.yaml",
        tpl_plBioinfo=DIR_TPL+"/template-PL-Bioinfo_multiqc_config.yaml"
    output:
        rap_plNGS="{OUTDIR}/multiQC/multiQC-PL09_{idNGS}.html",
        rap_plBioinfo="{OUTDIR}/multiQC/multiQC-PL21_{idNGS}.html"
    shell:
        """
        sed -e "s/ID_NGS_HERE/{idNGS}/g" < {input.tpl_plNGS} > {OUTDIR}/multiQC/PL-NGS_multiqc_config.yaml
        sed -e "s/ID_NGS_HERE/{idNGS}/g" < {input.tpl_plBioinfo} > {OUTDIR}/multiQC/PL-Bioinfo_multiqc_config.yaml
        multiqc -c {OUTDIR}/multiQC/PL-NGS_multiqc_config.yaml -n {output.rap_plNGS} {OUTDIR}/
        multiqc -c {OUTDIR}/multiQC/PL-Bioinfo_multiqc_config.yaml -n {output.rap_plBioinfo} {OUTDIR}/
        rm -r {OUTDIR}/temp
        """

onsuccess:
    shell("date '+%d/%m/%Y %H:%M:%S' > time_end_RNAseq.txt") 

是因为我在params规则的outputSortMeRNA中使用了它,而不是在此规则的inputs中使用了它?因为对于version规则中的Bowtie,在规则的outputs中而不是input中使用它似乎不是问题。

我知道这是一个常见错误,我已经检查了其他帖子,但没有解决我的问题。我想使用用于SortMeRNA规则的数据库的名称(rRNAdb = config [“ rRNA_database”])...

wildcard snakemake rna-seq
1个回答
0
投票

所有output文件需要具有相同的通配符,否则将在解决作业依赖项时引起冲突。并非output:中的所有文件都具有{rRNAdb}通配符,这导致了此问题。例如,如果您有两个{rRNAdb}值,则两个都将写入文件"{OUTDIR}/temp/{sample}.fastq",snakemake正确地不允许这样做。

© www.soinside.com 2019 - 2024. All rights reserved.