我正在尝试在snakemake中完成一个简单的管道,但是我不知道如何正确使用通配符。
我有一个包含以下文件的文件夹data
:
data/sample1_P1.txt
data/sample1_P2.txt
data/sample2_P1.txt
data/sample2_P2.txt
此示例中的通配符为sampleX
和PX
。我要实现的是首先将文件移动到文件夹sample1
和sample2
中。
此步骤的期望输出:
data/sample1/sample1_P1.txt
data/sample1/sample1_P2.txt
data/sample2/sample2_P1.txt
data/sample2/sample2_P2.txt
下一步,我想将文件夹内的文件连接起来,生成文件:
data/sample1/sample1_concatenated.txt
data/sample2/sample2_concatenated.txt
这是我尝试过的:
pairs = {"P1" : "P1", "P2" : "P2"}
samples = {
"sample1": "sample1",
"sample2": "sample2"
}
rule all:
input: expand("data/{sample}/{sample}_concatenated.txt", sample = samples)
rule get_txt_files:
output:
"data/{sample}_{pair}.txt"
shell:
"""
echo 1 > {output}
"""
rule reorganise:
input:
expand("data/{{sample}}_{pair}.txt", \
pair=pairs)
output:
"data/{sample}/{sample}_{pair}.txt"
shell:
"mv {input} data/{wildcards.sample}/."
rule concat:
input:
expand("data/{{sample}}/{{sample}}_{pair}.txt", \
pair=pairs)
output:
"data/{sample}/{sample}_concatenated.txt"
shell:
"cat {input} > {output}"
我收到一条错误消息AmbiguousRuleException
,但我不知道如何解决此问题。
将以下内容添加到您的工作流程中:
wildcard_constraints:
pair = "|".join(pairs),
sample = "|".join(samples),
就像错误消息告诉您的一样,Snakemake找到了一种从两个不同规则产生输出的方法:
AmbiguousRuleException:
Rules reorganise and get_txt_files are ambiguous for the file data/sample1/sample1_concatenated.txt.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
reorganise: pair=concatenated,sample=sample1
get_txt_files: pair=concatenated,sample=sample1/sample1
Expected input files:
reorganise: data/sample1_P1.txt data/sample1_P2.txt
get_txt_files: Expected output files:
reorganise: data/sample1/sample1_concatenated.txt
get_txt_files: data/sample1/sample1_concatenated.txt
通过限制通配符,可以避免歧义。