将JSON解析为具有特定约束的fasta格式

问题描述 投票:0回答:1

我有一个看起来像这样的JSON

{
   "barcodes": {
        "0004F--0004R": {
            "Barcode UID": "4",
            "Sample ID": "10887581",
            "For Barcode Name": "0004F",
            "For Barcode Sequence": "GGTAGTGTGTATCAGTACATG",
            "Rev Barcode Name": "0004R",
            "Rev Barcode Sequence": "GGTAGTGTGTATCAGTACATG",
            "Genes Sequenced": "",
            "Ethnicity": "",
            "laa_params": {
                "--minLength": "3000",
                "--ignoreEnds": "60",
                "--maxReads": "2500",
                "--maxPhasingReads": "500"
            }
        },
        "0014F--0014R": {
            "Barcode UID": "14",
            "Sample ID": "10895675",
            "For Barcode Name": "0014F",
            "For Barcode Sequence": "GGTAGCGTCTATATACGTATA",
            "Rev Barcode Name": "0014R",
            "Rev Barcode Sequence": "GGTAGCGTCTATATACGTATA",
            "Genes Sequenced": "A/B/C",
            "Ethnicity": "British/Irish",
            "laa_params": {
                "--minLength": "3000",
                "--ignoreEnds": "60",
                "--maxReads": "2500",
                "--maxPhasingReads": "500"
            }
        },
        "0018F--0018R": {
            "Barcode UID": "18",
            "Sample ID": "10896709",
            "For Barcode Name": "0018F",
            "For Barcode Sequence": "GGTAGCATCACTACGCTAGAT",
            "Rev Barcode Name": "0018R",
            "Rev Barcode Sequence": "GGTAGCATCACTACGCTAGAT",
            "Genes Sequenced": "B/C",
            "Ethnicity": "British/Irish",
            "laa_params": {
                "--minLength": "3000",
                "--ignoreEnds": "60",
                "--maxReads": "2500",
                "--maxPhasingReads": "500"
            }
        }
   }
}

我使用此JSON创建一个fasta文件,在其中我将条形码“ 0014F--0014R”的名称分为两半。每个部分都放在一个文件中,然后在其下面的相关顺序如下:

>0014F
GGTAGCGTCTATATACGTATA
>0014R
GGTAGCGTCTATATACGTATA

我使用Groovy进行此操作,其代码为:

// Load JSON
// cfg_file is the JSON 
def analysis_config = jsonSlurper.parse(cfg_file)
// Create Keyset of "barcodes"
barcodes = Channel.from(analysis_config.barcodes.keySet())
// Create fasta:
new File('barcodes.fasta').withOutputStream { out ->
    analysis_config.barcodes.each { barcode -> 
        def (fname, revname) = barcode.key.split('--')
        out << ">$fname\n${barcode.value['For Barcode Sequence']}\n"
        out << ">$revname\n${barcode.value['Rev Barcode Sequence']}\n"
            }
}

我想更改此逻辑,以便如果“已排序的基因”为空,请跳过该条形码。

在“ 0004F--0004R”中,没有基因测序。如何实现此逻辑?

在Python中,您可以简单地做:

if not barcode['genessequenced']:
    continue

...,它将跳过该条形码。我本质上是一名Python程序员,并且正在使用Nextflow,它使用Groovy作为其基本语言。帮助将不胜感激。

注意

我感觉我的整个逻辑都必须改变。当前流是:

  1. 创建所有条形码的keySet()
  2. 用序列填充每个

现在,流程应为:1.使用“基因排序”创建条形码的keySet()2.用序列填充每个序列

所以,barcodes = Channel.from(analysis_config.barcodes.keySet())知道如何将逻辑添加到其中吗?

类似:

barcodes = Channel.from(analysis_config.barcodes.[if "Genes Sequenced"].keySet())
json parsing groovy fasta nextflow
1个回答
0
投票

在常规情况下将是这样:

analysis_config.barcodes.findAll{b-> b.value."Genes Sequenced"}.keySet()

analysis_config.barcodes.findAll{k,v-> v."Genes Sequenced"}.keySet()

analysis_config.barcodes.findAll{k,v-> v["Genes Sequenced"]}.keySet()
© www.soinside.com 2019 - 2024. All rights reserved.