Snakemake dryrun模式下的NameError

问题描述 投票:0回答:2

我是Snakemake的新手,我正在努力开发一些管道。当我使用通配符时,我遇到了一些问题,试图尽可能地自动化我的生物信息学分析。当管道变得更加复杂时,我遇到了麻烦(如下所示)。看起来Snakemake无法正确解析通配符。在Snakefile的干运行期间,通配符值在某些规则的执行中看起来是正确的。但是,相同的通配符会导致管道的不同步骤(规则)出错,我无法弄清楚原因。下面我提供干运行的代码和输出消息。

num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
num_normal=["327905-LR-41624"]
num_tumor=["327907-LR-41624"]

path="/path/to/Snakemake/"
genome="/path/to/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"

rule all:
    input:  
    expand("/path/to/Snakemake/AS-{num_tum}_tumor_no_dupl_sort_RG_LB.bam",num_tum=num_tumor),
    expand("/path/to/Snakemake/AS-{num_norm}_normal_no_dupl_sort_RG_LB.bam",num_norm=num_normal)
ruleorder: samtools_sort > remove_duplicates >  samtools_index #>     add_readgroup_tumor > add_readgroup_normal

rule trim_galore:
    input:
        r1="/path/to/Snakemake/AS-{num}_R1.fastq",
        r2="/path/to/Snakemake/AS-{num}_R2.fastq"
    output:
        "/path/to/Snakemake/AS-{num }_R1_val_1.fq",
        "/path/to/Snakemake/AS-{num }_R2_val_2.fq"
    shell:
        "module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore  --output_dir /path/to/Snakemake/  --paired {input.r1} {input.r2}  "  

rule bwa_mem:
    input:
        R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
        R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
    output:
        "/path/to/Snakemake/AS-{num}.bam"
    shell:
        "module load samtools/default ; module load bwa/0.7.8 ; bwa mem  {genome}  {input.R1} {input.R2} | samtools view -h -b  > {output} "

rule samtools_sort:
    input:
        "/path/to/Snakemake/AS-{num}.bam"
    output:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    shell:
        "module load samtools/default ; samtools sort -n  -O BAM {input} > {output} "

rule remove_duplicates:
    input:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    output:
        outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
        metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
    shell:
        "module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input}  -O {output.outbam} -M {output.metrics}  --REMOVE_DUPLICATES=true "

rule samtools_index:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
    shell:
        "module load samtools/default ; samtools index  {input} "

rule add_readgroup_normal:
    input:
    "/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort.bam"
output:
    "/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort_RG_LB.bam"
shell:
    "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { num_normal }   -PU  { num_normal }  -SM  NORMAL  -I  { input }    -O  {output} "

rule add_readgroup_tumor:
    input:
        "/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort_RG_LB.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { num_tumor }   -PU  { num_tumor }  -SM  TUMOR     -I  { input }    -O  {output} "

当我使用以下命令测试Snakefile时:.local / bin / snakemake -s Snakefile_pipeline --dryrun

我得到以下内容:

**Building DAG of jobs...**


**Job counts:**
    **count jobs
    1   add_readgroup_normal
    1   add_readgroup_tumor
    1   all
    2   bwa_mem
    2   remove_duplicates
    2   samtools_sort
    2   trim_galore
    11**

**[Mon Apr  8 16:14:27 2019]
rule trim_galore:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1.fastq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2.fastq
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
    jobid: 9
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule trim_galore:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1.fastq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2.fastq
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
    jobid: 10
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule bwa_mem:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
    output: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
    jobid: 8
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule bwa_mem:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
    jobid: 7
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule samtools_sort:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
    jobid: 5
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule samtools_sort:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
    jobid: 6
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule remove_duplicates:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam, /path/to/Snakemake/AS-327907-LR-41624_tumor_dupl_metrics.txt
    jobid: 3
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule remove_duplicates:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam, /path/to/Snakemake/AS-327905-LR-41624_normal_dupl_metrics.txt
    jobid: 4
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule add_readgroup_normal:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort_RG_LB.bam
    jobid: 2
    wildcards: num_normal=327905-LR-41624**

**RuleException in line 93 of /home/l136n/Snakefile_mapping_snv_call_pipeline2:
NameError: The name ' num_normal ' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}**

我用谷歌搜索了错误但没有找到帮助。此外,我仔细检查了管道是否有任何不合适情况。我期望的输出在规则“all”中表示。规则“add_readgroup_normal”和“add_readgroup_tumor”应该采用由前面步骤生成的输入文件的不同子集,这些子集在所有文件上运行。我想知道问题是否以某种方式出现,因为这分为2个子集。我再说一遍,我对Snakemake很陌生,所以我可能会在某处丢失一些愚蠢的东西!任何帮助都会非常感激,因为我完全陷入困境!非常感谢你提前!

num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
normal=["327905-LR-41624_normal"]
num_tumor=["327907-LR-41624_tumor"]

path="/path/to/Snakemake/"
genome="/icgc/dkfzlsdf/analysis/B210/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"

rule all:
    input:  
        "/path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam.bai",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam.bai",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam"
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_RG.bam"


rule trim_galore:
    input:
        r1="/path/to/Snakemake/AS-{num}_R1.fastq",
        r2="/path/to/Snakemake/AS-{num}_R2.fastq"
    output:
        "/path/to/Snakemake/AS-{num }_R1_val_1.fq",
        "/path/to/Snakemake/AS-{num }_R2_val_2.fq"
    shell:
        "module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore  --output_dir /path/to/Snakemake/  --paired {input.r1} {input.r2}  "  

rule bwa_mem:
    input:
        R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
        R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
    output:
        "/path/to/Snakemake/AS-{num}.bam"
    shell:
        "module load samtools/default ; module load bwa/0.7.8 ; bwa mem  {genome}  {input.R1} {input.R2} | samtools view -h -b  > {output} "

rule samtools_sort:
    input:
        "/path/to/Snakemake/AS-{num}.bam"
    output:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    shell:
        "module load samtools/default ; samtools sort -n  -O BAM {input} > {output} "

rule remove_duplicates:
    input:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    output:
        outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
        metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
    shell:
        "module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input}  -O {output.outbam} -M {output.metrics}  --REMOVE_DUPLICATES=true "

rule samtools_index:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
    shell:
        "module load samtools/default ; samtools index  {input} "

rule add_readgroup_normal:
    input:
        "/path/to/Snakemake/AS-{normal}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{normal}_RG.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { wildcards.normal }   -PU  { wildcards.normal }  -SM  NORMAL  -I  { input }    -O  {output} "

rule add_readgroup_tumor:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num_,'.*tumor.*'}_RG.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { wildcards.num }   -PU  { wildcards.num }  -SM  TUMOR     -I  { input }    -O  {output} "

错误:

Building DAG of jobs...
MissingInputException in line 37 of /home/l136n/Snakefile_mapping_snv_call_pipeline2b1:
Missing input files for rule trim_galore:
/path/to/Luca/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R1.fastq
/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R2.fastq
bioinformatics snakemake
2个回答
0
投票

使用语法shell,而不是{wilcards.var},可以在{var}中访问通配符。你在rule add_readgroup_normal有后者。 Source


0
投票

我想我会提供解决方案,即使帖子现在有点老了。错误只是由于“{wildcards.var}”中存在空格。

© www.soinside.com 2019 - 2024. All rights reserved.