如何在 Snakemake 规则中嵌入自定义 Python 函数和多个通配符

Question

我是snakemake的新手，我尝试更好地理解它。我浏览了这些文档，但有时我对某些概念感到困惑。假设我有一个 Snakemake 规则，它接受输入文件并生成一些输出。

我准备了一个如下所示的配置文件：

配置.yml：

mydir: "/usr/home/data/"

我的表格数据如下所示：

样品	文件
样品A	文件1.fastq
样品B	文件2.fastq
样本C	文件3.fastq

我将表格数据读入蛇文件中：

configfile: config.yml

import pandas as pd
table = pd.read_table("samples.tsv").set_index("sample", drop=False)

然后我的蛇文件顶部有一条规则，它收集我的虚拟规则的输出：

rule all:
   expand("output/{sample}_output.txt", sample=table["file"])

rule dummy:
input: config["mydir"] + "{sample}"
output: "output/{sample}_output.txt"
shell: "sometool {input} {output}"

如果数据文件夹下有两个子目录，如何检查 tsv 文件中给出的文件（即 file1.fastq）是否位于文件夹 A 或文件夹 B 中，然后根据其存在情况将其用作规则中的输入？另外，如何在相应文件的示例之后命名我的输出？例如，我不希望输出 file1.fastq_output.txt，而是希望将输出命名为相应的样本名称 exampleA_output.txt，但仍使用“file1.fastq”作为运行 shell 命令的输入。

我的想法是检查文件是否位于文件夹 A 或 B 中，并使用 Snakemake 规则之外的自定义 Python 函数，如下所示：

def get_file(file):  
test = config["mydir"] + file  
if os.path.exists(test) == True:    
  return test

然后使用此自定义函数作为规则虚拟的输入

rule dummy:   
input: get_file({sample})   
output: "output/{sample}_output.txt"   
shell: "sometool {input} {output}"

为了根据相应的示例列重命名输出文件，我考虑使用 loc 函数，因为行基于示例列，并且可以以这种格式访问 table.loc["rowname"]["column"] 。但是， get_file 函数无法作为我的虚拟规则的输入正常工作。我不确定为什么会发生这种情况，以及如何根据示例名称重命名输出，但仍使用文件列来处理我的 shell 命令？

Answer 1

我在代码中看到的一个问题是，函数作为输入

get_file

应该接受输入一个作为通配符对象的参数。所以类似：

def get_file(wc):
    file = table[table.sample == wc.sample]['file'].iloc[0]
    ## Possibly some more logic to return the input file.
    return file

然后使用该函数：

rule dummy:   
    input: get_file,
    output: "output/{sample}_output.txt"   
    shell: "sometool {input} {output}"

如何在 Snakemake 规则中嵌入自定义 Python 函数和多个通配符

问题描述投票：0回答：1

1个回答

最新问题

如何在 Snakemake 规则中嵌入自定义 Python 函数和多个通配符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1