我使用的是snakemake版本7.30.1
我正在尝试使用snakemake --cores 4运行我的snakemake工作流程。Snakemake似乎能够找到输入文件,并且似乎开始完成工作流程中第一条规则的步骤,但随后由于某种原因退出MissingOutputExcpetion 错误指出无法找到样本列表中两个样本中第二个样本的输出文件。这似乎不是文件本身的问题,因为当我切换文件的顺序时,新的第一个示例会运行,而新的第二个示例不会运行。我也尝试过更改延迟,但没有帮助。
我正在尝试在我的第一条规则中运行 fastp 来获取两个样本和两次读取。输出应生成文件 M31A_150k_1_final.fq、M28B_150k_1_final.fq、M31A_150k_2_final.fq、M28B_150k_2_final.fq:
base_path = "/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/"
Define list of sample names
samples = ["M31A_150k" , "M28B_150k"]
rule all:
input:
expand(base_path + "bai/{sample}_all.bam.bai", sample=samples),
expand(base_path + "bai/{sample}_forward.bam.bai", sample=samples),
expand(base_path + "bai/{sample}_reverse.bam.bai", sample=samples),
expand(base_path + "bigwig/{sample}.bw", sample=samples),
expand(base_path + "bigwig/{sample}_forward.bw", sample=samples),
expand(base_path + "bigwig/{sample}_reverse.bw", sample=samples)
rule fastp_adaptors:
input:
R1 = expand(base_path + "testfiles/{sample}_1.fq", sample=samples),
R2 = expand(base_path + "testfiles/{sample}_2.fq", sample=samples)
output:
R1_final = expand(base_path + "trimmed/{sample}_1_final.fq", sample=samples),
R2_final = expand(base_path + "trimmed/{sample}_2_final.fq", sample=samples)
shell:
"""
fastp -w 8 --dont_eval_duplication -i {input.R1} -I {input.R2} -t 10 -F 10 -o {output.R1_final} -O {output.R2_final} --detect_adapter_for_pe
"""
这是我收到的错误日志:
valeriaaizen@Valerias-MacBook-Pro \~/D/c/n/snakemake-attempt (main)\> snakemake --cores 4 (myenv_x86)
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
all 1 1 1
bowtie2 1 1 1
deeptools_bigwigall 1 1 1
deeptools_bigwigforward 1 1 1
deeptools_bigwigreverse 1 1 1
fastp_adaptors 1 1 1
merge_83163 1 1 1
merge_99147 1 1 1
reverse 1 1 1
samtools_indexall 1 1 1
samtools_indexforward 1 1 1
samtools_sort 1 4 4
samtools_sort147 1 1 1
samtools_sort163 1 1 1
samtools_sort83 1 1 1
samtools_sort99 1 1 1
total 16 1 4
Select jobs to execute...
\[Thu Sep 7 14:39:53 2023\]
rule fastp_adaptors:
input: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_1.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_1.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_2.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_2.fq
output: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
jobid: 4
reason: Missing output files: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
resources: tmpdir=/var/folders/4c/h8ky28xj143dkssjycttn5lr0000gn/T
Detecting adapter sequence for read1...
Illumina TruSeq Adapter Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Detecting adapter sequence for read2...
No adapter detected for read2
Read1 before filtering:
total reads: 150000
total bases: 22500000
Q20 bases: 21987079(97.7204%)
Q30 bases: 21372363(94.9883%)
Read2 before filtering:
total reads: 150000
total bases: 22500000
Q20 bases: 21768444(96.7486%)
Q30 bases: 21103172(93.7919%)
Read1 after filtering:
total reads: 136856
total bases: 18856683
Q20 bases: 18594358(98.6088%)
Q30 bases: 18347138(97.2978%)
Read2 after filtering:
total reads: 136856
total bases: 17587532
Q20 bases: 17259790(98.1365%)
Q30 bases: 16852551(95.821%)
Filtering result:
reads passed filter: 273712
reads failed due to low quality: 2162
reads failed due to too many N: 18
reads failed due to too short: 24108
reads with adapter trimmed: 35295
bases trimmed due to adapters: 2204956
Insert size peak (evaluated by paired-end reads): 150
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --dont_eval_duplication -i /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_1.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_1.fq -I /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_2.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_2.fq -t 10 -F 10 -o /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq -O /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq --detect_adapter_for_pe
fastp v0.22.0, time used: 8 seconds
Waiting at most 5 seconds for missing files.
MissingOutputException in rule fastp_adaptors in file /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/Snakefile, line 35:
Job 4 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
Removing output files of failed job fastp_adaptors since they might be corrupted:
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-09-07T143950.741220.snakemake.log
rule fastp_adaptors:
input:
R1 = expand(base_path + "testfiles/{sample}_1.fq", sample=samples),
R2 = expand(base_path + "testfiles/{sample}_2.fq", sample=samples)
output:
R1_final = expand(base_path + "trimmed/{sample}_1_final.fq", sample=samples),
R2_final = expand(base_path + "trimmed/{sample}_2_final.fq", sample=samples)
shell:
"""
fastp -w 8 --dont_eval_duplication -i {input.R1} -I {input.R2} -t 10
-F 10 -o {output.R1_final} -O {output.R2_final} --detect_adapter_for_pe
"""
我猜
fastp_adaptors
必须在每个对fastq文件上运行一次(在你的情况下总共运行两次)。但是,由于您的输入和输出指令中有 expand
,所以 fastp_adaptors
在所有对上仅运行一次,从而导致错误。因此,请尝试删除 expand
中的 fastp_adaptors
。 (如果你是snakemake的新手,这是让初学者感到困惑的事情之一)