我试图在Snakemake中完成一个简单的管道,但我不知道如何正确使用通配符。
我有一个文件夹 data
含有以下文件。
data/sample1_P1.txt
data/sample1_P2.txt
data/sample2_P1.txt
data/sample2_P2.txt
本例中的通配符是: sampleX
和 PX
. 我想实现的是首先将文件移动到以下文件夹中 sample1
和 sample2
.
这一步的理想输出。
data/sample1/sample1_P1.txt
data/sample1/sample1_P2.txt
data/sample2/sample2_P1.txt
data/sample2/sample2_P2.txt
下一步,我想把文件夹里的文件连起来,生成文件。
data/sample1/sample1_concatenated.txt
data/sample2/sample2_concatenated.txt
这是我试过的
pairs = {"P1" : "P1", "P2" : "P2"}
samples = {
"sample1": "sample1",
"sample2": "sample2"
}
rule all:
input: expand("data/{sample}/{sample}_concatenated.txt", sample = samples)
rule get_txt_files:
output:
"data/{sample}_{pair}.txt"
shell:
"""
echo 1 > {output}
"""
rule reorganise:
input:
expand("data/{{sample}}_{pair}.txt", \
pair=pairs)
output:
"data/{sample}/{sample}_{pair}.txt"
shell:
"mv {input} data/{wildcards.sample}/."
rule concat:
input:
expand("data/{{sample}}/{{sample}}_{pair}.txt", \
pair=pairs)
output:
"data/{sample}/{sample}_concatenated.txt"
shell:
"cat {input} > {output}"
我得到一个错误信息 AmbiguousRuleException
但我不知道如何解决这个问题。
在你的工作流程中添加以下内容。
wildcard_constraints:
pair = "|".join(pairs),
sample = "|".join(samples),
就像错误信息告诉你的那样 Snakemake找到了一种方法来产生两个不同规则的输出。
AmbiguousRuleException:
Rules reorganise and get_txt_files are ambiguous for the file data/sample1/sample1_concatenated.txt.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
reorganise: pair=concatenated,sample=sample1
get_txt_files: pair=concatenated,sample=sample1/sample1
Expected input files:
reorganise: data/sample1_P1.txt data/sample1_P2.txt
get_txt_files: Expected output files:
reorganise: data/sample1/sample1_concatenated.txt
get_txt_files: data/sample1/sample1_concatenated.txt
通过限制通配符,你可以避免歧义。