InputFunctionException in line 226 of /users/troger50/projects/Rogers_SidersViralAnalysis_XXXX_20XX/preprocess_reads.smk:
Error:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
organism=0
Traceback:
File "/users/troger50/projects/Rogers_SidersViralAnalysis_XXXX_20XX/preprocess_reads.smk", line 228, in <lambda>
我是snakemake的新手,正在用头撞墙,试图找出一种解决方法来解决一些应该如此简单的事情。这是我的管道:
configfile:"new_configure.yml"
rule all_run:
input:
multiqc = "quality_checks/multiqc/multiqc_report.html"
rule multiqc: #New
input:
fastqc= [expand(["quality_checks/fastqc/{organism}_r1_fastqc.html","quality_checks/fastqc/{organism}_r2_fastqc.html"], organism=organism) for sample, value in config["Samples"].items() for organism, j in value.items()],
output:
"quality_checks/multiqc/multiqc_report.html"
params:
in_put="quality_checks/"
shell:
"""
multiqc -d -dd 1 {params.in_put} -o quality_checks/multiqc/ --export
"""
rule Index:
input:
r1 = "data/raw/{sample}-{organism}_R1.fastq.gz"
output:
indexgz="data/raw/{sample}-{organism}-index1.fq.gz"
log:
Index_log="logs/{sample}-{organism}_Index.log"
shell:
"""
(zcat {input.r1} |awk '{{if( (NR%4)==1){{ print $0; print substr($2,length($2)-20,10); print "+"; print "CCCCCCCCCC"}}}}' |gzip > {output.indexgz}) 2> {log.Index_log}
"""
#Demultiplex Raw reads first
rule demultiplex_reads:
input:
r1="data/raw/{sample}-{organism}_R1.fastq.gz",
r2="data/raw/{sample}-{organism}_R2.fastq.gz",
indexgz="data/raw/{sample}-{organism}-index1.fq.gz"
output:
r1 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
r2 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"
params:
index="data/raw/Index.txt",
log:
"logs/{sample}-{organism}_{fraction}_deML.log"
shell:
"""
(/projects/luo_lab/deML/src/deML -o data/raw/demultiplexed/demultiplexed -i {params.index} -f {input.r1} -r {input.r2} -if1 {input.indexgz}) 2> {log}
"""
#Trim addapters from raw reads
rule bbduk_adp:
input:
r1 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
r2 = "data/raw/demultiplexed/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"
output:
r1 = temp("data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz"),
r2 = temp("data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz")
log:
"logs/{sample}-{organism}_{fraction}_bbduk_adp.log"
shell:
"""
(/projects/luo_lab/bbmap/bbduk.sh ordered in1={input.r1} in2={input.r2} \
out1={output.r1} \
out2={output.r2} \
ref=/projects/luo_lab/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo) 2> {log}
"""
# Further trimming step to increase the quality
rule bbduk_qal:
input:
r1 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
r2 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"
output:
r1 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r1.fq.gz",
r2 = "data/processed/clean_reads/demultiplexed_{sample}-{organism}_{fraction}_r2.fq.gz"
log:
# "logs/{sample}-{organism}_{fraction}_bbduk_qal.log"
shell:
"""
(/projects/luo_lab/bbmap/bbduk.sh ordered in1={input.r1} in2={input.r2} \
out1={output.r1} \
out2={output.r2} \
ref=/projects/luo_lab/bbmap/resources/adapters.fa k=27 hdist=1 qtrim=rl trimq=17 cardinality=t mingc=0.05 maxgc=0.95) 2> {log}
"""
rule merge_clean_reads:
input:
r1 = lambda wildcards : [f"data/processed/clean_reads/demultiplexed_{wildcards.sample}-{wildcards.organism}_{fraction}_r1.fq.gz" for fraction in config["Samples"][wildcards.sample][wildcards.organism]],
r2 = lambda wildcards : [f"data/processed/clean_reads/demultiplexed_{wildcards.sample}-{wildcards.organism}_{fraction}_r2.fq.gz" for fraction in config["Samples"][wildcards.sample][wildcards.organism]]
output:
r1="data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz",
r2="data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"
shell:
"""
cat {input.r1} > {output.r1}
cat {input.r1} > {output.r2}
"""
# Evalutate the quality of the trimmed reads
rule fastqc:
output:
["quality_checks/fastqc/{organism}_r1_fastqc.html","quality_checks/fastqc/{organism}_r2_fastqc.html"]
input:
["data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz","data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"]
log:
"logs/merged_{organism}_fastqc_clean.log"
shell:
"""
(fastqc -o quality_checks/fastqc/ {input:q}) 2> {log}
"""
这是我的配置文件:
ROOT: "."
Samples:
day7-DO-0-12C:
0:
- "1-6"
- "7"
- "8"
- "9"
- "10-12"
viral:
- "1-6"
- "7"
- "8"
- "9"
- "10-12"
day7-DO-0-13C:
0:
- "1-5"
- "6"
- "7"
- "8"
- "9-12"
viral:
- "1-6"
- "7"
- "8"
- "9"
- "10-12"
问题出在规则 merge_clean_reads 上。在输入中,我使用 lamdba 函数并将通配符称为“样本”和“有机体”。该错误很明显,因为我没有将“sample”定义为通配符。但是,我不确定应该在哪里将其定义为 merge_clean_reads 的输出应该只考虑“有机体”通配符,并且应该只有四个输出文件:merged_0_r1.fq.gz、merged_0_r2.fq.gz、merged_viral_r1.fq.gz,和 merged_viral_r2.fq.gz。我知道这一定是一个简单的解决办法,但我还没能解决它,哈哈。任何帮助将不胜感激。
您是对的,错误是因为规则中可用的通配符由 output 文件中的通配符决定。该规则根本没有
sample
通配符可在您的输入函数中使用。
相反,您需要一个生物体与属于它的所有样本和片段之间的图谱。您的配置中有此信息,但顺序不正确。如果可以的话,我会考虑重新组织您的配置层次结构。否则,下面的代码将构建您需要的地图:
# basically reverse the first two levels of your config['Samples']
# these names are verbose for clarity, you may have better intuition about what they mean
organisms_samples_fractions = {}
for sample, organism_fraction in config["Samples"].items():
for organism, fractions in organism_fraction.items():
if organism not in organisms_samples_fractions:
organisms_samples_fractions[organisms] = {}
organisms_samples_fractions[organisms][sample] = fractions
def merge_clean_reads_input(wildcards):
# now you have to deal with some fractions possibly not present in all samples
# otherwise i would use an expand
return {
'r1': [
f"data/processed/clean_reads/demultiplexed_{sample}-{wildcards.organism}_{fraction}_r1.fq.gz",
for sample in organisms_samples_fractions[wildcards.organism]
for fraction in organisms_samples_fractions[wildcards.organism][sample]
],
'r2': [
f"data/processed/clean_reads/demultiplexed_{sample}-{wildcards.organism}_{fraction}_r2.fq.gz",
for sample in organisms_samples_fractions[wildcards.organism]
for fraction in organisms_samples_fractions[wildcards.organism][sample]
],
}
rule merge_clean_reads:
input:
unpack(merge_clean_reads_input)
output:
r1="data/processed/clean_reads/merged/merged_{organism}_r1.fq.gz",
r2="data/processed/clean_reads/merged/merged_{organism}_r2.fq.gz"
shell:
"""
cat {input.r1} > {output.r1}
cat {input.r2} > {output.r2} # NOTE this was input.r1
"""