基于Python中的列合并两个没有重复的json文件的简单方法

问题描述 投票:0回答:1

我想将 2 个 json 文件合并到一个 json 文件中,并根据一列(第二列)删除所有重复的行。目前,我手动合并两个或多个 json 文件,然后使用 python 代码删除所有具有重复 userid 的行。

第一个 json 文件:

    [
        {
            "userid": "567897068",
            "status": "UserStatus.RECENTLY",
            "name": "btb appeal court",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "6403980168",
            "status": "UserStatus.RECENTLY",
            "name": "Ah",
            "bot": false,
            "username": "fearpic"
        },
        {
            "userid": "7104649590",
            "status": "UserStatus.RECENTLY",
            "name": "Da",
            "bot": false,
            "username": "Abc130000"
        },
        {
            "userid": "5813962086",
            "status": "UserStatus.RECENTLY",
            "name": "Sothea",
            "bot": false,
            "username": "SotheaSopheap169"
        }
    ]

第二个json文件:

    [
        {
            "userid": "567897068",
            "status": "UserStatus.RECENTLY",
            "name": "btb appeal court",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "111111111111",
            "status": "UserStatus.RECENTLY",
            "name": "Ah",
            "bot": false,
            "username": "fearpic"
        },
        {
            "userid": "7104649590",
            "status": "UserStatus.RECENTLY",
            "name": "Da",
            "bot": false,
            "username": "Abc130000"
        },
        {
            "userid": "555555555555",
            "status": "UserStatus.RECENTLY",
            "name": "Sothea",
            "bot": false,
            "username": "SotheaSopheap169"
        }
    ]

合并的文件应该是:

    [
        {
            "userid": "567897068",
            "status": "UserStatus.RECENTLY",
            "name": "btb appeal court",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "6403980168",
            "status": "UserStatus.RECENTLY",
            "name": "Ah",
            "bot": false,
            "username": "fearpic"
        },
        {
            "userid": "7104649590",
            "status": "UserStatus.RECENTLY",
            "name": "Da",
            "bot": false,
            "username": "Abc130000"
        },
        {
            "userid": "5813962086",
            "status": "UserStatus.RECENTLY",
            "name": "Sothea",
            "bot": false,
            "username": "SotheaSopheap169"
        },
        {
            "userid": "111111111111",
            "status": "UserStatus.RECENTLY",
            "name": "Ah",
            "bot": false,
            "username": "fearpic"
        },
        {
            "userid": "555555555555",
            "status": "UserStatus.RECENTLY",
            "name": "Sothea",
            "bot": false,
            "username": "SotheaSopheap169"
        }
        
    ]

我使用以下代码根据手动合并的 json 文件中的 userid 列删除重复项:

    import json
    with open('source_user_all.json', 'r', encoding='utf-8') as f:
        jsons = json.load(f)

    ids = set()
    jsons2 = []
    for item in jsons:
        if item['userid'] not in ids:
            ids.add(item['userid'])
            jsons2.append(item)
            
    with open('source_user.json', 'w', encoding='utf-8') as nf:
        json.dump(jsons2, nf, indent=4)

以上效果很好。

是否有一种简单的方法可以合并 multip json 文件,并在写入单个输出文件之前根据列删除所有重复项?

谢谢

python json duplicates
1个回答
0
投票

您只需要通过循环输入文件来构建字典。

像这样:

import json

files = ["json1.json", "json2.json"]

td = dict()

for file in files:
    with open(file) as fd:
        for d in json.load(fd):
            uid = d["userid"]
            if not uid in td:
                td[uid] = d

with open("merged.json", "w") as fd:
    json.dump(list(td.values()), fd, indent=2)

如果您不关心删除哪个重复项,则可以删除条件检查(不是 td 中的 uid

© www.soinside.com 2019 - 2024. All rights reserved.