我有 config.txt,其中包含配对末端读取的名称:
forward reverse
sample1_forward.fastq.gz sample1_reverse.fastq.gz
sample2_forward.fastq.gz sample2_reverse.fastq.gz
sample3_forward.fastq.gz sample3_reverse.fastq.gz
我尝试了这个 Snakefile:
import pandas as pd
samples_information=pd.read_csv("config.txt", sep=' ', index_col=False)
r1=list(samples_information['forward'])
r2=list(samples_information['reverse'])
rule pastp_pe:
input:
read1=expand('samples/{sample1}', sample1=r1),
read2=expand('samples/{sample2}', sample2=r2)
output:
trim_read1="trimmed/{sample1}.fastq",
trim_read2="trimmed/{sample2}.fastq"
conda:
"envs/fastp.yml"
threads: 4
shell:
"""
fastp --thread {threads} -i {input.read1} -I {input.read2} -o {output.trim_read1} -O {output.trim_read2}
"""
我收到了这条消息:
语法错误: 并非规则 Pastp_pe 的所有输出、日志和基准文件都包含相同的通配符。但这很重要,以避免两个或多个作业写入同一个文件。
我不知道如何修改文件或以其他方式提交配置作为输入
我建议您在配置文件中添加一个
sample_name
列:
sample_name forward reverse
sample_1 sample1_forward.fastq.gz sample1_reverse.fastq.gz
sample_2 sample2_forward.fastq.gz sample2_reverse.fastq.gz
sample_3 sample3_forward.fastq.gz sample3_reverse.fastq.gz
然后您可以在 fastp 规则中使用输入函数:
import pandas as pd
samples_information = pd.read_csv("config.txt", sep=" ", index_col=False)
sample_names = samples_information["sample_name"].tolist()
def get_reads(wildcards):
r1 = samples_information.loc[samples_information["sample_name"] == wildcads.sample][
"forward"
].item()
r2 = samples_information.loc[samples_information["sample_name"] == wildcads.sample][
"reverse"
].item()
return {"read1": r1, "read2": r2}
rule all:
input:
expand("trimmed/{sample}.forward.fastq", sample=sample_names),
expand("trimmed/{sample}.reverse.fastq", sample=sample_names),
rule pastp_pe:
input:
unpack(get_reads),
output:
trim_read1="trimmed/{sample}.forward.fastq",
trim_read2="trimmed/{sample}.reverse.fastq",
conda:
"envs/fastp.yml"
threads: 4
shell:
"""
fastp --thread {threads} -i {input.read1} -I {input.read2} -o {output.trim_read1} -O {output.trim_read2}
"""