大型非结构化 JSON 数据的编码/解码机制

Question

我有一个 JSON 数据，其结构可能类似于



    [
    {
        "action": 4,
        "key1": {
            "key1": "key1",
            "key2": "key1",
            "key3": "key3",
            "key4": ["key4"],
            "key5": [
                {
                    "key5": "key5"
                }
            ]
        }
    },
    {
        "action": 2,
        "key2": [
            3,
            {
                "key121": "key1",
                "key21": "key1",
                "key33": "key3",
                "key4": ["key4"],
                "key5": [
                    {
                        "key5": "key5"
                    }
                ]
            },
            {
                "key121": "key1",
                "key2133": "key1",
                "key33333": "key3",
                "key41": ["key4"],
                "key521": [
                    {
                        "key531": "key5"
                    }
                ]
            }
        ],
        "key3": "key3",
        // .... more and more here
    }
// .... more and more here
]

大小可能超过 1MB。

数据只是一个例子，表明结构可以是非常动态的。我需要某种方法来编码/解码此类数据。流程如下

编码 -> 存储在 Redis 中
从 Redis 读取 -> 解码

我需要第二部分的最佳表现方式

read from Redis -> decode

。因此，如果它的大小较小，就会减少从 Redis 获取数据的时间，然后我需要一种有效的方法来解码编码数据以获得 JSON。

我已经累了

JSON.stringify/JSON.parse - 有效，但我需要更好的性能 avsc (https://www.npmjs.com/package/avsc) - 这很好，但就我而言，由于我有一个非常动态的结构，因此我遇到了很多问题，例如，数组中不同的

record

类型（据我所知，avsc 不支持）等。

msgpack-lite - 并不比 JSON.stringify/JSON.parse 更有效
cbor-x - 不比 JSON.stringify/JSON.parse 更有效
flatbuffers - https://www.npmjs.com/package/flatbuffers - 基于模式 - 不适用于本例
schemapack - https://www.npmjs.com/package/schemapack 对于简单对象来说效果还不错，但已经有 7 年历史了
protobufjs - https://www.npmjs.com/package/protobufjs - 基于模式。

该进程将在负载下工作（每小时超过 1500 万次），因此性能是关键特征。

我的 JSON.stringify/JSON.parse 10k 迭代的基准测试结果

JSON 解析

总共20毫秒

平均迭代0.0016619213000001163ms
JSON 字符串化

总共18毫秒

平均迭代0.0013616301999999764ms

我承认关键表现将在

decode

部分。因此，如果

encode

花费的时间较少，则

decode

过程可能会花费一些时间。

一些基准代码示例


const data = require('./data.json');

class time {
    startTime = 0;

    init() {
        this.startTime = process.hrtime();
    }

    capture() {
        const end = process.hrtime(this.startTime);
        this.startTime = process.hrtime();
        return end[0] + end[1] / 1000000;
    }
}

const results = [];

const mainNow = Date.now();

const t = new time();

for (let i = 0; i < 1000; i++) {
    t.init();
    const stringified = JSON.stringify(data);
    JSON.parse(stringified);
    results.push(t.capture());
}

console.log('results', results);
console.log('total', Date.now() - mainNow);
console.log('avg', results.reduce((acc, curr) => acc + curr, 0) / results.length);

Answer 1

您可以使用任何第三方 JSON 流解析器，例如

https://www.npmjs.com/package/stream-json

完整列表在这里：

https://github.com/nodejs/node-v0.x-archive/issues/7543#issuecomment-41970902

您可以尝试使用 @streamparser/json 逐个获取 JSON，只需决定如何处理和写入值即可：

const https = require('https');
const {Tokenizer} = require('@streamparser/json');

class MyTokenizer extends Tokenizer {
    onToken({ token, value }) {
        console.log(token, value);
    }
}

const tokenizer = new MyTokenizer;

const url = 'https://raw.githubusercontent.com/seductiveapps/largeJSON/master/100mb.json';

https.get(url, (networkStream) => {

    networkStream.on('readable', function () {
        // There is some data to read now.
        let data;

        while ((data = this.read(1000)) !== null) {
            tokenizer.write(data);
        }
    });

});

大型非结构化 JSON 数据的编码/解码机制

问题描述投票：0回答：1

1个回答

最新问题

大型非结构化 JSON 数据的编码/解码机制

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1