Snakemake处理从文件读取的PE名称

问题描述 投票:0回答:1

我有 config.txt,其中包含配对末端读取的名称:

forward reverse
sample1_forward.fastq.gz sample1_reverse.fastq.gz
sample2_forward.fastq.gz sample2_reverse.fastq.gz
sample3_forward.fastq.gz sample3_reverse.fastq.gz

我尝试了这个 Snakefile:

import pandas as pd

samples_information=pd.read_csv("config.txt", sep=' ', index_col=False)

r1=list(samples_information['forward'])
r2=list(samples_information['reverse'])

rule pastp_pe:
    input:
        read1=expand('samples/{sample1}', sample1=r1),
        read2=expand('samples/{sample2}', sample2=r2)
    output:
        trim_read1="trimmed/{sample1}.fastq",
        trim_read2="trimmed/{sample2}.fastq"
    conda:
        "envs/fastp.yml"
    threads: 4
    shell:
        """
         fastp --thread {threads} -i {input.read1} -I {input.read2} -o {output.trim_read1} -O {output.trim_read2}
        """

我收到了这条消息:

语法错误: 并非规则 Pastp_pe 的所有输出、日志和基准文件都包含相同的通配符。但这很重要,以避免两个或多个作业写入同一个文件。

我不知道如何修改文件或以其他方式提交配置作为输入

bioinformatics snakemake
1个回答
0
投票

我建议您在配置文件中添加一个

sample_name
列:

sample_name forward reverse
sample_1 sample1_forward.fastq.gz sample1_reverse.fastq.gz
sample_2 sample2_forward.fastq.gz sample2_reverse.fastq.gz
sample_3 sample3_forward.fastq.gz sample3_reverse.fastq.gz

然后您可以在 fastp 规则中使用输入函数:

import pandas as pd

samples_information = pd.read_csv("config.txt", sep=" ", index_col=False)
sample_names = samples_information["sample_name"].tolist()


def get_reads(wildcards):
    r1 = samples_information.loc[samples_information["sample_name"] == wildcads.sample][
        "forward"
    ].item()
    r2 = samples_information.loc[samples_information["sample_name"] == wildcads.sample][
        "reverse"
    ].item()
    return {"read1": r1, "read2": r2}


rule all:
    input:
        expand("trimmed/{sample}.forward.fastq", sample=sample_names),
        expand("trimmed/{sample}.reverse.fastq", sample=sample_names),


rule pastp_pe:
    input:
        unpack(get_reads),
    output:
        trim_read1="trimmed/{sample}.forward.fastq",
        trim_read2="trimmed/{sample}.reverse.fastq",
    conda:
        "envs/fastp.yml"
    threads: 4
    shell:
        """
        fastp --thread {threads} -i {input.read1} -I {input.read2} -o {output.trim_read1} -O {output.trim_read2}
        """

© www.soinside.com 2019 - 2024. All rights reserved.