如何在 snakemake 的规则中运行 for 循环时转义 MissingOutputException

问题描述 投票:0回答:0

我有以下蛇文件:

import os

SAMPLES = [i.replace('sample_', '') for i in os.listdir('final_raw')]

rule all:
    input:
        expand('analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq', sample = SAMPLES)
        
        
rule trimming:
    input: 
        read1 = 'final_raw/sample_{sample}/read1_sample_{sample}.fastq.gz',
        read2 = 'final_raw/sample_{sample}/read2_sample_{sample}.fastq.gz', 
        barcodes = 'final_bc.fasta'
    output:
        'analysis/trimmed_data/{sample}_read_1.fastq.gz',
        'analysis/trimmed_data/{sample}_read_2.fastq.gz'
    threads: 6
    shell:
        'flexbar -threads {threads} '
        '-r {input.read1} '
        '-p {input.read2} '
        '-aa Nextera -br {input.barcodes} '
        '-t analysis/trimmed_data/{wildcards.sample}_read '
        '-z GZ'
        
        
rule create_hybrid_genome:
    input:
        human = 'genomes/human/GRCh38.fna',
        covid = 'genomes/cov/cov_genome.fna'
    output:
        'genomes/hybrid/hybrid.fna'
    shell:
        'cat {input.human} {input.covid} > {output}'
        
        
rule hybrid_indexing:
    input:
        rules.create_hybrid_genome.output
    output:
        'genomes/hybrid/hybrid.fna.amb'
    shell:
        'bwa index {input}'
        
        
rule map_hybrid:
    input:
        read1 = 'analysis/trimmed_data/{sample}_read_1.fastq.gz',
        read2 = 'analysis/trimmed_data/{sample}_read_2.fastq.gz',
        genome = 'genomes/hybrid/hybrid.fna'
    output:
        'analysis/results/mapped/hybrid/{sample}.sorted.bam'
    shell:
        'bwa mem -t 8 {input.genome} {input.read1} {input.read2} | '
        'samtools sort -@ 2 -o {wildcards.sample}.sorted.bam && '
        'samtools index -@ 8 {wildcards.sample}.sorted.bam && '
        'mv {wildcards.sample}.sorted.* analysis/results/mapped/hybrid'
        
        
rule unmapped_hybrid:
    input: 
        rules.map_hybrid.output
    output:
        'analysis/results/unmapped/hybrid/um_{sample}.bam'
    shell:
        'samtools view -b -f 4 {input} > um_{wildcards.sample}.bam && '
        'mv um_{wildcards.sample}.bam analysis/results/unmapped/hybrid'
        
        
rule fastq_hybrid:
    input:
        rules.unmapped_hybrid.output
    output:
        read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq',
        read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
    shell:
        'samtools fastq '
        '-1 {output.read1} '
        '-2 {output.read2} '
        '-0 /dev/null '
        '-s /dev/null '
        '-n {input}'

我正在尝试添加一个运行以下 for 循环的规则“kraken”:

for FILE in $(ls analysis/results/unmapped/hybrid/read1_*_hy.fastq | sed 's/read1_*_hy.fastq//'); do \
        kraken2 --db {input.database} \
        --memory-mapping \
        --threads 8 \
        --use-names \
        --report analysis/results/kraken2/hybrid/{{}}${{FILE}}.kraken \
        --paired read1_${{FILE}}_hy.fastq read2_${{FILE}}_hy.fastq \
        --unclassified-out /analysis/results/unclassified/hybrid/{{}}${{FILE}}_uc#.fastq; done

for 循环负责一次运行所有.fastq 文件,以便将kraken 数据库存储在内存中,并对所有样本运行分类。 如果不参考采用通配符并为每个样本运行命令的常规 snakemake shell,如何实现这一点?

正常的方法是有以下内容:

rule kraken2:
    input:
        database = 'kraken2_db'
        read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq'
        read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
    output:
        'analysis/results/kraken2/hybrid/{sample}.kraken'
    shell:
        'kraken2 '
            '--db {input.database} '
            '--threads 8 '
            '--paired '
            '--use-names '
            '--unclassified-out {wildcards.sample}_uc#.fq '
            '--report {output} '
            '{input.read1} {input.read2}'

和规则全部更新为以下内容:

rule all:
    input:
        expand('analysis/results/kraken2/hybrid/{sample}.kraken', sample = SAMPLES)

这将一次运行每个示例,我正在努力在 shell 中运行 for 循环,并且大多数时候都会出现错误:

MissingInputException in rule all  in line 5 of /home/stud9/NAS/snakefile: Missing input files for rule all: affected files:

有什么建议吗?

bioinformatics snakemake
© www.soinside.com 2019 - 2024. All rights reserved.