如何指示 Snakemake 在云执行(K8s/GKE)中从 GCS 存储桶下载“目录”类型输入

问题描述 投票:0回答:0

我正在尝试制作一个在 GKE 上运行的教程管道。LINK

失败的具体规则是

rule salmon_quant:
    output: directory("salmon.{sample}")
    input:
        index = "Saccharomyces_cerevisiae.R64-1-1.salmon_index",
        fq1   = "trimmed/{sample}_1.fq",
        fq2   = "trimmed/{sample}_2.fq",
    shell:
        "salmon quant -i {input.index} -l A -1 {input.fq1} -2 {input.fq2} --validateMappings -o {output}"

rule salmon_index:
    output:
        idx = directory("{strain}.salmon_index")
    input:
        fasta = "transcriptome/{strain}.cdna.all.fa.gz"
    shell:
        "salmon index -t {input.fasta} -i {output.idx} -k 31"

我从 Pod 日志中得到的错误显示:

Downloading from remote: smk_demo/trimmed/ref_3_2.fq
Finished download.
Downloading from remote: smk_demo/Saccharomyces_cerevisiae.R64-1-1.salmon_index
Finished download.
Downloading from remote: smk_demo/trimmed/ref_3_1.fq
Finished download.
Activating conda environment: .snakemake/conda/c7537b64e219a0e5525266c7cc140f93_
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
### salmon (mapping-based) v1.2.1
#
# other salmon output omitted 
#
[2023-07-26 05:28:49.492] [jointLog] [info] There is 1 library.
Exception : [Error: The index version file smk_demo/Saccharomyces_cerevisiae.R64-1-1.salmon_index/versionInfo.json doesn't seem to exist.  Please try re
salmon quant was invoked improperly.

我检查了容器内部:

k exec -it snakejob-21242d20-1417-50c4-a83b-8e5e4ae5f8d3 -- ls /workdir/smk_demo/Saccharomyces_cerevisiae.R64-1-1.salmon_index/
ls: cannot access '/workdir/smk_demo/Saccharomyces_cerevisiae.R64-1-1.salmon_index/': Not a directory
command terminated with exit code 2

我确认规则

salmon_index
生成的目录(以及其中的文件)确实已上传到存储桶中。

由于 GCS 存储桶并没有真正的“目录”。我认为原因是:

Downloading from remote: smk_demo/Saccharomyces_cerevisiae.R64-1-1.salmon_index
它已作为文件下载。

我尝试在输入末尾添加尾部

/
index = "Saccharomyces_cerevisiae.R64-1-1.salmon_index/",
这没有帮助。

我使用的是snakemake 7.28.3,我找不到任何与我的问题相符的错误或票证。

因为directory()仅适用于输出,不适用于输入。所以我不知道是否有另一种方法让snakemake知道它需要从GCS下载输入作为目录?

google-cloud-storage snakemake
© www.soinside.com 2019 - 2024. All rights reserved.