在带通配符和不带通配符的管道中的多个位置使用相同的规则

Question

使用 snakemake，我试图将相同的规则应用于 N 个独立文件以及一个由所有这些 N 个文件合并而成的文件。

我创建了一个最小的例子，你可以在下面找到。这样做：

我有一堆文件作为我的初始输入，它们的路径在我几乎无法控制的配置文件中给出。

首先我要提取我正在处理的那些文件的特定部分（规则

create_list

）（规则

do_stuff_on_list

）。

这很好用，我正在尝试做但遇到麻烦的是将所有“列表”合并在一起（规则

merge_lists

）并应用完全相同的处理（规则

do_stuff_on_list

）。

config_file = {
    "result_files": [
        {
            "id": 0,
            "path": "/path/to/readonly/location/1.txt"
        },
        {
            "id": 8,
            "path": "/path/to/readonly/location/2.txt"
        },
        {
            "id": 4,
            "path": "/path/to/readonly/location/3.txt"
        }
    ]
}

SAMPLES = {str(x["id"]): x["path"] for x in config_file["result_files"]}

rule all:
    input:
        "AAA_finalResult.txt"

rule create_list:
    input:
        sample_path = lambda wildcards: SAMPLES[wildcards.sample]
    output:
        "{sample}_mut_list.json"
    shell:
        "touch {output}"

rule merge_lists:
    input:
        expand(rules.create_list.output, sample=SAMPLES.keys())
    output:
        "merged_mut_list.json"
    shell:
        "touch {output}"

rule do_stuff_on_list:
    input:
        rules.create_list.output
    output:
        "{sample}_stuff.json"
    shell:
        "touch {output}"

rule merge_all_results:
    input:
        expand(rules.do_stuff_on_list.output, sample=SAMPLES.keys()),
    output:
        "AAA_finalResult.txt"
    shell:
        "touch {output}"

我知道我绝对可以通过创建与

do_stuff_on_list

相同的第二条规则来解决该问题，该规则将合并作为输入。但我觉得应该有更好的方法，但我想不通......

有办法做那种事情吗？

Answer 1

规则继承可能会为您解决问题。大致：

use rule do_stuff_on_list as do_stuff_on_merged_list with:
    input: rules.merge_all_results.output,
    output: "{sample}_merged_stuff.json",

请注意，如文档中所示，规则继承可用于修改规则的任何部分，实际执行步骤除外（在您的示例中

shell

）。

Answer 2

您可以在

input:

的

rule do_stuff_on_list:

指令中使用通配符。如果这些足够通用以允许

sample

特定输入和

merged

或

all

（或任何您命名的）输入，则该规则将在两个地方使用。

在带通配符和不带通配符的管道中的多个位置使用相同的规则

问题描述投票：0回答：2

2个回答

最新问题

在带通配符和不带通配符的管道中的多个位置使用相同的规则

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2