我想将 2 个 json 文件合并到一个 json 文件中,并根据一列(第二列)删除所有重复的行。目前,我手动合并两个或多个 json 文件,然后使用 python 代码删除所有具有重复 userid 的行。
第一个 json 文件:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "6403980168",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "5813962086",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
第二个json文件:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "111111111111",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "555555555555",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
合并的文件应该是:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "6403980168",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "5813962086",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
},
{
"userid": "111111111111",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "555555555555",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
我使用以下代码根据手动合并的 json 文件中的 userid 列删除重复项:
import json
with open('source_user_all.json', 'r', encoding='utf-8') as f:
jsons = json.load(f)
ids = set()
jsons2 = []
for item in jsons:
if item['userid'] not in ids:
ids.add(item['userid'])
jsons2.append(item)
with open('source_user.json', 'w', encoding='utf-8') as nf:
json.dump(jsons2, nf, indent=4)
以上效果很好。
是否有一种简单的方法可以合并 multip json 文件,并在写入单个输出文件之前根据列删除所有重复项?
谢谢
您只需要通过循环输入文件来构建字典。
像这样:
import json
files = ["json1.json", "json2.json"]
td = dict()
for file in files:
with open(file) as fd:
for d in json.load(fd):
uid = d["userid"]
if not uid in td:
td[uid] = d
with open("merged.json", "w") as fd:
json.dump(list(td.values()), fd, indent=2)
如果您不关心删除哪个重复项,则可以删除条件检查(不是 td 中的 uid)