我有一个源集合,其中包含前 4 个字段索引的以下文档。
[{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'},
{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'},
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'},
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'},
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'},
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'},
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'},
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'}]
我想将不同的文档从我的源集合(基于前 4 个字段)复制到另一个集合。新收藏应包含以下文件。我的源代码集合非常庞大,因此性能将是如何实现这一举措的重要考虑因素。
[{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'},
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'},
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'},
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'}]
我尝试了聚合管道,但只能列出文档的不同值,而不是整个文档。
[{state: 'NY', city: 'New York', zip: '10000', store: '1234'},
{state: 'NY', city: 'New York', zip: '10100', store: '1234'},
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345'},
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345'}]
您可以将
dict
转换为 frozenset
,然后通过创建 set
来获得独特的。
input = [{'state': 'NY', 'city': 'New York', 'zip': '10000', 'store': '1234', 'item': '1234', 'size': 'L'},
{'state': 'NY', 'city': 'New York', 'zip': '10000', 'store': '1234', 'item': '1234', 'size': 'L'},
{'state': 'NY', 'city': 'New York', 'zip': '10100', 'store': '1234', 'item': '1234', 'size': 'L'},
{'state': 'NY', 'city': 'New York', 'zip': '10100', 'store': '1234', 'item': '1234', 'size': 'L'},
{'state': 'NJ', 'city': 'Newark', 'zip': '08800', 'store': '2345', 'item': '2345', 'size': 'M'},
{'state': 'NJ', 'city': 'Newark', 'zip': '08800', 'store': '2345', 'item': '2345', 'size': 'M'},
{'state': 'NJ', 'city': 'Newark', 'zip': '08810', 'store': '2345', 'item': '2345', 'size': 'M'},
{'state': 'NJ', 'city': 'Newark', 'zip': '08810', 'store': '2345', 'item': '2345', 'size': 'M'}]
print([dict(s) for s in set(frozenset(d.items()) for d in input)])