我正在尝试使用snakemake通过染色体合并多个vcf文件。我的文件是这样的,如您所见,它具有各种坐标。合并所有chr1A和所有chr1B的最佳方法是什么?
chr1A:0-2096.filtered.vcf
chr1A:2096-7896.filtered.vcf
chr1B:0-3456.filtered.vcf
chr1B:3456-8796.filtered.vcf
我的伪代码:
chromosomes=["chr1A","chr1B"]
rule all:
input:
expand("{sample}.vcf", sample=chromosomes)
rule merge:
input:
I1="path/to/file/{sample}.xxx.filtered.vcf",
I2="path/to/file/{sample}.xxx.filtered.vcf",
output:
outf ="{sample}.vcf"
shell:
"""
java -jar picard.jar GatherVcfs I={input.I1} I={input.I2} O={output.outf}
"""
我没有snakemake atm,而且bioconda会永久使用XD,但将在星期一进行更新:
d = {"chr1A": ["0-2096.filtered.vcf", "2096-7896.filtered.vcf"]
"chr1B":["0-3456.filtered.vcf", "3456-8796.filtered.vcf"]}
chromosomes=["chr1A","chr1B"]
rule all:
input:
expand("{sample}.vcf", sample=chromosomes)
rule merge:
input:
lambda w: d[w.chromosome]
output:
outf ="{chromosome}.vcf"
params:
lambda w: "I= " + "I=".join(d[w.chromosome])
shell:
"""
java -jar picard.jar GatherVcfs {params[0]} O={output.outf}
"""