使用公用键值对的字典列表中的总和

问题描述 投票:5回答:9

如何对字典列表中的重复元素求和?

示例列表:

data = [
        [
            {'user': 1, 'rating': 0},
            {'user': 2, 'rating': 10},
            {'user': 1, 'rating': 20},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 20},
            {'user': 1, 'rating': 10}
        ],
    ]

预期输出:

op = [
        [
            {'user': 1, 'rating': 20},
            {'user': 2, 'rating': 10},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 30},
        ],
    ]
python dictionary nested-lists
9个回答
2
投票
您可以尝试:

from itertools import groupby result = [] for lst in data: sublist = sorted(lst, key=lambda d: d['user']) grouped = groupby(sublist, key=lambda d: d['user']) result.append([ {'user': name, 'rating': sum([d['rating'] for d in group])} for name, group in grouped])

明智地对result rating进行排序:

result = [sorted(sub, key=lambda d: d['rating']) for sub in result]

结果:

# print(result) [ [ {'user': 2, 'rating': 10}, {'user': 3, 'rating': 10}, {'user': 1, 'rating': 20} ], [ {'user': 4, 'rating': 4}, {'user': 1, 'rating': 30}, {'user': 2, 'rating': 80} ] ]


3
投票
使用pandas

>>> import pandas as pd >>> [pd.DataFrame(dicts).groupby('user', as_index=False, sort=False).sum().to_dict(orient='records') for dicts in data] [[{'user': 1, 'rating': 20}, {'user': 2, 'rating': 10}, {'user': 3, 'rating': 10}], [{'user': 4, 'rating': 4}, {'user': 2, 'rating': 80}, {'user': 1, 'rating': 30}]]


1
投票
op = [] for lst in data: rating_of_user = {} for e in lst: user, rating = e['user'], e['rating'] rating_of_user[user] = rating_of_user.get(user, 0) + rating op.append([{'user': u, 'rating': r} for u, r in rating_of_user.items()])
注:自Python 3.7字典正式保留插入顺序以来

0
投票
import pprint data = [ [ {'user': 1, 'rating': 0}, {'user': 2, 'rating': 10}, {'user': 1, 'rating': 20}, {'user': 3, 'rating': 10} ], [ {'user': 4, 'rating': 4}, {'user': 2, 'rating': 80}, {'user': 1, 'rating': 20}, {'user': 1, 'rating': 10} ], ] def find(user, l): for i, d in enumerate(l): if user == d['user']: return i return -1 data_sum = [] for l in data: list_sum = [] for d in l: idx = find(d['user'], list_sum) if idx == -1: list_sum.append(d) else: list_sum[idx]['rating'] += d['rating'] data_sum.append(list_sum) pprint.pprint(data_sum)

0
投票
这应该起作用:

from collections import defaultdict data_without_duplicates = [] for l in data: users_ratings = defaultdict(int) for d in l: users_ratings[d["user"]] += d["rating"] data_without_duplicates.append( [{"user": user, "rating": rating} for user, rating in users_ratings.items()] )


0
投票
data = [ [ {'user': 1, 'rating': 0}, {'user': 2, 'rating': 10}, {'user': 1, 'rating': 20}, {'user': 3, 'rating': 10} ], [ {'user': 4, 'rating': 4}, {'user': 2, 'rating': 80}, {'user': 1, 'rating': 20}, {'user': 1, 'rating': 10} ], ] keyname = "user" all = [] for row in data: row_out = [] for d in row: key = d[keyname] for d2 in row_out: if d2[keyname] == d[keyname]: break else: d2 = {keyname: key} row_out.append(d2) for k, v in d.items(): if k == keyname: continue d2[k] = d2.get(k, 0) + v all.append(row_out) print(all)
给予:

[[{'user': 1, 'rating': 20}, {'user': 2, 'rating': 10}, {'user': 3, 'rating': 10}], [{'user': 4, 'rating': 4}, {'user': 2, 'rating': 80}, {'user': 1, 'rating': 30}]]


0
投票
应避免排序,因为每个项目都可以通过一次处理。任何基于哈希的技术都应该更好。

这里是一个替代解决方案,它使用defaultdict而不是昂贵的sort / groupby或pandas。

from collections import defaultdict from functools import reduce def reduce_func(state, item): new_obj = { "user": item["user"], "rating": state[item["user"]]["rating"] + item["rating"]} } state[item["user"]] = new_obj return state output = [list(reduce(reduce_func, elem, defaultdict(lambda: {"rating": 0})).values()) for elem in data]


0
投票
Python列表理解:

from collections import Counter x = [[ {'user': x[0], 'rating': x[1]} for x in Counter({d['user']: d['rating'] for d in group}).most_common()] for group in data ]

输出:

[ [ { "rating": 20, "user": 1 }, { "rating": 10, "user": 2 }, { "rating": 10, "user": 3 } ], [ { "rating": 80, "user": 2 }, { "rating": 10, "user": 1 }, { "rating": 4, "user": 4 } ] ]


-1
投票
这应该可以解决您的问题:

>>> from collections import Counter >>> Counter( { 'aaa' : 2 } ) Counter({'aaa': 2}) >>> Counter( { 'aaa' : 2, 'bbb' : 3} ) + Counter( { 'aaa' : 2 } ) Counter({'aaa': 4, 'bbb': 3}) >>>

© www.soinside.com 2019 - 2024. All rights reserved.