我目前正在为 cDNA 构建一个处理管道,我的管道中的一个进程在一个数组中输出 7 个不同的 fastq 文件,其中有 7 个 id 项是元数据,我需要以 Id 与fastq 文件具有相同的 ID,目前我正在按照上一步生成的顺序将 ID 与 fastq 文件配对。
在使用转置功能之前,有问题的通道如下所示:
[[[id:L5ad_T1, single_end:true],
[id:L5Cd_T1, single_end:true],
[id:L5Ac_T1, single_end:true],
[id:L5Cc_T1, single_end:true],
[id:L5Ab_T1, single_end:true],
[id:L5Aa_T1, single_end:true],
[id:L5Ca_T1, single_end:true]
],
[/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq, /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
]
在对通道中的数据使用转置函数之前:
[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq][[id:L5Cd_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq][[id:L5Ac_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq][[id:L5Cc_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq]
[[id:L5Ab_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
[[id:L5Aa_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
[[id:L5Ca_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
虽然这是正确的格式,但 ID 元现在与不正确的 fastq 文件相关联,例如第一对的理想结果是:
[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]
有没有办法将正确的 ID 关联到正确的文件?
一种方法是创建元数据映射和 FASTQ 文件映射,其中每个映射共享相同的密钥。然后我们可以循环遍历其中一个映射并在另一个映射中查找键的值。
flatMap
运算符可用于展平输出,以便单独发出每个项目。例如:
ch = Channel.of(
[
[
[id:'L5ad_T1', single_end:true],
[id:'L5Cd_T1', single_end:true],
[id:'L5Ac_T1', single_end:true],
[id:'L5Cc_T1', single_end:true],
[id:'L5Ab_T1', single_end:true],
[id:'L5Aa_T1', single_end:true],
[id:'L5Ca_T1', single_end:true]
],
[
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq'),
file('/datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq')
]
]
)
workflow {
ch.flatMap { meta_list, fastq_list ->
def meta_map = meta_list.collectEntries { meta ->
[ meta.id.split('_').first(), meta ]
}
def fastq_map = fastq_list.collectEntries { fastq ->
[ fastq.simpleName.split('_').last(), fastq ]
}
meta_map.collect { k, v -> [v, fastq_map[k]] }
}
.view()
}
结果:
$ nextflow run main.nf
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [happy_waddington] DSL2 - revision: 345d777205
[[id:L5ad_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]
[[id:L5Cd_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
[[id:L5Ac_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
[[id:L5Cc_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq]
[[id:L5Ab_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNATTAGC_L5Ab.fastq]
[[id:L5Aa_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
[[id:L5Ca_T1, single_end:true], /datastore/homes3/s1954394/project/nf-core-cracflexalign/work/43/55b2e9afb11e0518a8244f3898ec3f/flexbar_trimmed_NNNCTAGC_L5Ca.fastq]