指定 Groovy 转置输出

问题描述 投票:0回答:1

我目前正在为 cDNA 构建一个处理管道,我的管道中的一个进程在一个数组中输出 7 个不同的 fastq 文件,其中有 7 个 id 项是元数据,我需要以 Id 与fastq 文件具有相同的 ID,目前我正在按照上一步生成的顺序将 ID 与 fastq 文件配对。

在使用转置功能之前,有问题的通道如下所示:

[[[id:L5ad_T1, single_end:true], 
  [id:L5Cd_T1, single_end:true], 
  [id:L5Ac_T1, single_end:true], 
  [id:L5Cc_T1, single_end:true], 
  [id:L5Ab_T1, single_end:true], 
  [id:L5Aa_T1, single_end:true], 
  [id:L5Ca_T1, single_end:true]
 ], 
 [/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
]

在对通道中的数据使用转置函数之前:


[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq][[id:L5Cd_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq][[id:L5Ac_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq][[id:L5Cc_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq]
[[id:L5Ab_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
[[id:L5Aa_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
[[id:L5Ca_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]

虽然这是正确的格式,但 ID 元现在与不正确的 fastq 文件相关联,例如第一对的理想结果是:

[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]

有没有办法将正确的 ID 关联到正确的文件?

groovy bioinformatics nextflow
1个回答
0
投票

一种方法是创建元数据映射和 FASTQ 文件映射,其中每个映射共享相同的密钥。然后我们可以循环遍历其中一个映射并在另一个映射中查找键的值。

flatMap
运算符可用于展平输出,以便单独发出每个项目。例如:

ch = Channel.of(
    [
        [
            [id:'L5ad_T1', single_end:true], 
            [id:'L5Cd_T1', single_end:true], 
            [id:'L5Ac_T1', single_end:true], 
            [id:'L5Cc_T1', single_end:true], 
            [id:'L5Ab_T1', single_end:true], 
            [id:'L5Aa_T1', single_end:true], 
            [id:'L5Ca_T1', single_end:true]
        ],
        [
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq'),
            file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq')
        ]
    ]
)
workflow {

    ch.flatMap { meta_list, fastq_list -> 
        def meta_map = meta_list.collectEntries { meta ->
            [ meta.id.split('_').first(), meta ]
        }
        def fastq_map = fastq_list.collectEntries { fastq ->
            [ fastq.simpleName.split('_').last(), fastq ]
        }

        meta_map.collect { k, v -> [v, fastq_map[k]] }
    }
    .view()
}

结果:

$ nextflow run main.nf 
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [happy_waddington] DSL2 - revision: 345d777205
[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]
[[id:L5Cd_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
[[id:L5Ac_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
[[id:L5Cc_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq]
[[id:L5Ab_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq]
[[id:L5Aa_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
[[id:L5Ca_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq]
© www.soinside.com 2019 - 2024. All rights reserved.