Snakemake:AttributeError:'通配符'对象没有属性'sample

问题描述 投票:0回答:1
InputFunctionException in line 226 of /users/troger50/projects/Rogers_SidersViralAnalysis_XXXX_20XX/preprocess_reads.smk:
Error:
  AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
  organism=0
Traceback:
  File "/users/troger50/projects/Rogers_SidersViralAnalysis_XXXX_20XX/preprocess_reads.smk", line 228, in <lambda>

我是snakemake的新手,正在用头撞墙,试图找出一种解决方法来解决一些应该如此简单的事情。这是我的管道:

configfile:"new_configure.yml"

rule all_run:
    input:
        multiqc = "quality_checks/multiqc/multiqc_report.html"


rule multiqc: #New
    input:
        fastqc= [expand(["quality_checks/fastqc/{organism}_r1_fastqc.html","quality_checks/fastqc/{organism}_r2_fastqc.html"], organism=organism) for sample, value in config["Samples"].items() for organism, j in value.items()],
        
    output:
        "quality_checks/multiqc/multiqc_report.html"
    params:
        in_put="quality_checks/"
    shell:
        """
           multiqc -d -dd 1 {params.in_put} -o quality_checks/multiqc/ --export
        """

rule Index:
    input:
        r1 = "data/raw/{sample}-{organism}_R1.fastq.gz"
    output:
        indexgz="data/raw/{sample}-{organism}-index1.fq.gz"
    log:
        Index_log="logs/{sample}-{organism}_Index.log"
    shell:
        """
            (zcat  {input.r1} |awk '{{if( (NR%4)==1){{ print $0; print substr($2,length($2)-20,10); print "+"; print "CCCCCCCCCC"}}}}' |gzip  > {output.indexgz}) 2> {log.Index_log}
        """

#Demultiplex Raw reads first
rule demultiplex_reads:
    input:
        r1="data/raw/{sample}-{organism}_R1.fastq.gz", 
        r2="data/raw/{sample}-{organism}_R2.fastq.gz",
        indexgz="data/raw/{sample}-{organism}-index1.fq.gz"
    output:
        r1 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
        r2 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"
    params:
        index="data/raw/Index.txt",
    log:
        "logs/{sample}-{organism}_{fraction}_deML.log"
    shell:
        """
            (/projects/luo_lab/deML/src/deML -o data/raw/demultiplexed/demultiplexed -i {params.index} -f  {input.r1}  -r {input.r2} -if1 {input.indexgz}) 2> {log}
        """

#Trim addapters from raw reads
rule bbduk_adp:
    input: 
        r1 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
        r2 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"

    output:
        r1 = temp("data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz"),
        r2 = temp("data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz")

    log:
        "logs/{sample}-{organism}_{fraction}_bbduk_adp.log"
    shell:
        """
            (/projects/luo_lab/bbmap/bbduk.sh ordered in1={input.r1} in2={input.r2} \
                out1={output.r1} \
                out2={output.r2} \
                ref=/projects/luo_lab/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo) 2> {log}
        """

# Further trimming step to increase the quality
rule bbduk_qal:
    input: 
        r1 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
        r2 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"

    output:
        r1 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
        r2 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"

    log:
        # "logs/{sample}-{organism}_{fraction}_bbduk_qal.log"
    shell:
        """
            (/projects/luo_lab/bbmap/bbduk.sh ordered in1={input.r1} in2={input.r2} \
                out1={output.r1} \
                out2={output.r2} \
                ref=/projects/luo_lab/bbmap/resources/adapters.fa k=27 hdist=1 qtrim=rl trimq=17 cardinality=t mingc=0.05 maxgc=0.95) 2> {log}
        """

rule merge_clean_reads:
    input:
        r1 = lambda wildcards : [f"data/processed/clean_reads/demultiplexed_{wildcards.sample}-{wildcards.organism}_{fraction}_r1.fq.gz" for fraction in config["Samples"][wildcards.sample][wildcards.organism]],
        r2 = lambda wildcards : [f"data/processed/clean_reads/demultiplexed_{wildcards.sample}-{wildcards.organism}_{fraction}_r2.fq.gz" for fraction in config["Samples"][wildcards.sample][wildcards.organism]]

    output:
        r1="data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz",
        r2="data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"
    shell:
        """
            cat  {input.r1} > {output.r1}
            cat  {input.r1} > {output.r2}
        """

# Evalutate the quality of the trimmed reads
rule fastqc:
    output:
        ["quality_checks/fastqc/{organism}_r1_fastqc.html","quality_checks/fastqc/{organism}_r2_fastqc.html"]
    input:  
        ["data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz","data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"]
    log:
        "logs/merged_{organism}_fastqc_clean.log"
    shell:
        """
            (fastqc -o quality_checks/fastqc/ {input:q}) 2> {log}
        """

这是我的配置文件:

ROOT: "."
Samples:
    day7-DO-0-12C: 
        0:
            - "1-6"
            - "7"
            - "8"
            - "9"
            - "10-12"
        viral:
            - "1-6"
            - "7"
            - "8"
            - "9"
            - "10-12"

    day7-DO-0-13C: 
        0:
            - "1-5"
            - "6"
            - "7"
            - "8"
            - "9-12"
        viral: 
            - "1-6"
            - "7"
            - "8"
            - "9"
            - "10-12"

问题出在规则 merge_clean_reads 上。在输入中,我使用 lamdba 函数并将通配符称为“样本”和“有机体”。该错误很明显,因为我没有将“sample”定义为通配符。但是,我不确定应该在哪里将其定义为 merge_clean_reads 的输出应该只考虑“有机体”通配符,并且应该只有四个输出文件:merged_0_r1.fq.gz、merged_0_r2.fq.gz、merged_viral_r1.fq.gz,和 merged_viral_r2.fq.gz。我知道这一定是一个简单的解决办法,但我还没能解决它,哈哈。任何帮助将不胜感激。

python wildcard attributeerror snakemake
1个回答
0
投票

您是对的,错误是因为规则中可用的通配符由 output 文件中的通配符决定。该规则根本没有

sample
通配符可在您的输入函数中使用。

相反,您需要一个生物体与属于它的所有样本和片段之间的图谱。您的配置中有此信息,但顺序不正确。如果可以的话,我会考虑重新组织您的配置层次结构。否则,下面的代码将构建您需要的地图:

# basically reverse the first two levels of your config['Samples']
# these names are verbose for clarity, you may have better intuition about what they mean
organisms_samples_fractions = {}
for sample, organism_fraction in config["Samples"].items():
    for organism, fractions in organism_fraction.items():
        if organism not in organisms_samples_fractions:
            organisms_samples_fractions[organisms] = {}
        organisms_samples_fractions[organisms][sample] = fractions


def merge_clean_reads_input(wildcards):
    # now you have to deal with some fractions possibly not present in all samples
    # otherwise i would use an expand
    return {
        'r1': [
            f"data/processed/clean_reads/demultiplexed_{sample}-{wildcards.organism}_{fraction}_r1.fq.gz",
            for sample in organisms_samples_fractions[wildcards.organism]
            for fraction in organisms_samples_fractions[wildcards.organism][sample]
              ],
        'r2': [
            f"data/processed/clean_reads/demultiplexed_{sample}-{wildcards.organism}_{fraction}_r2.fq.gz",
            for sample in organisms_samples_fractions[wildcards.organism]
            for fraction in organisms_samples_fractions[wildcards.organism][sample]
              ],
            }


rule merge_clean_reads:
    input:
        unpack(merge_clean_reads_input)

    output:
        r1="data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz",
        r2="data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"
    shell:
        """
            cat  {input.r1} > {output.r1}
            cat  {input.r2} > {output.r2}  # NOTE this was input.r1
        """
© www.soinside.com 2019 - 2024. All rights reserved.