如何加速数百万对象的 python 实例初始化?

问题描述 投票:0回答:3

我定义了一个名为

class
的python
Edge
如下:

class Edge:
    def __init__(self):
        self.node1 = 0
        self.node2 = 0
        self.weight = 0

现在我必须使用以下方法创建大约 10^6 到 10^7 个 Edge 实例:

edges= []
for (i,j,w) in ijw:
    edge = Edge()
    edge.node1 = i
    edge.node2 = j
    edge.weight = w
    edges.append(edge)

我在桌面上花了大约 2 秒钟。有没有更快的方法呢?

python performance instance
3个回答
20
投票

你不能让它更快much,但我当然会使用

__slots__
来节省内存分配。还可以在创建实例时传入属性值:

class Edge:
    __slots__ = ('node1', 'node2', 'weight')
    def __init__(self, node1=0, node2=0, weight=0):
        self.node1 = node1
        self.node2 = node2
        self.weight = weight

随着更新的

__init__
,您可以使用列表理解:

edges = [Edge(*args) for args in ijw]

这些一起可以节省大量创建对象的时间,大约需要一半的时间。

比较创建100万个对象;设置:

>>> from random import randrange
>>> ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 6)]
>>> class OrigEdge:
...     def __init__(self):
...         self.node1 = 0
...         self.node2 = 0
...         self.weight = 0
...
>>> origloop = '''\
... edges= []
... for (i,j,w) in ijw:
...     edge = Edge()
...     edge.node1 = i
...     edge.node2 = j
...     edge.weight = w
...     edges.append(edge)
... '''
>>> class SlotsEdge:
...     __slots__ = ('node1', 'node2', 'weight')
...     def __init__(self, node1=0, node2=0, weight=0):
...         self.node1 = node1
...         self.node2 = node2
...         self.weight = weight
...
>>> listcomploop = '''[Edge(*args) for args in ijw]'''

和时间:

>>> from timeit import Timer
>>> count, total = Timer(origloop, 'from __main__ import OrigEdge as Edge, ijw').autorange()
>>> (total / count) * 1000 # milliseconds
722.1121070033405
>>> count, total = Timer(listcomploop, 'from __main__ import SlotsEdge as Edge, ijw').autorange()
>>> (total / count) * 1000 # milliseconds
386.6706900007557

快了将近 2 倍。

增加随机输入列表到10^7项,时间差保持:

>>> ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 7)]
>>> count, total = Timer(origloop, 'from __main__ import OrigEdge as Edge, ijw').autorange()
>>> (total / count)
7.183759553998243
>>> count, total = Timer(listcomploop, 'from __main__ import SlotsEdge as Edge, ijw').autorange()
>>> (total / count)
3.8709938440006226

4
投票

还有另一种使用recordclass library的快速和节省内存的方法:

from recordclass import dataobject

from random import randrange
import sys
ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 7)]

class EdgeDO(dataobject):
    __fields__ = 'node1', 'node2', 'weight'

class EdgeSlots:
    __slots__ = 'node1', 'node2', 'weight'

    def __init__(self, node1, node2, weight):
         self.node1 = node1
         self.node2 = node2
         self.weight = weight
            
def list_size(lst):
    return sum(sys.getsizeof(o) for o in lst)

%time list_do = [EdgeDO(n1, n2, w) for n1, n2, w in ijw]
%time list_slots = [EdgeSlots(n1, n2, w) for n1, n2, w in ijw]

print('size (dataobject):', list_size(list_do))
print('size (__slots__): ', list_size(list_slots))

有输出:

CPU times: user 2.23 s, sys: 20 ms, total: 2.25 s
Wall time: 2.25 s
CPU times: user 6.79 s, sys: 84.1 ms, total: 6.87 s
Wall time: 6.87 s
size (dataobject): 400000000
size (__slots__):  640000000

附言从 0.15 开始,有选项

fast_new
可以更快地创建实例。默认情况下从 0.18
fast_new=True
开始。低于 64 位 python 3.9 的新性能计数器。

from random import randrange
import sys
ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 7)]

class EdgeDO(dataobject, fast_new=True):
    __fields__ = 'node1', 'node2', 'weight'

class EdgeSlots:
    __slots__ = 'node1', 'node2', 'weight'

    def __init__(self, node1, node2, weight):
         self.node1 = node1
         self.node2 = node2
         self.weight = weight
        
def list_size(lst):
    return sum(sys.getsizeof(o) for o in lst)

print('dataobject timinig:')
%time list_do = [EdgeDO(*args) for args in ijw]
print('__slots__ timinig:')
%time list_slots = [EdgeSlots(*args) for args in ijw]

print('size (dataobject):', list_size(list_do))
print('size (__slots__): ', list_size(list_slots))
print(list_size(list_do)/list_size(list_slots)*100, "%")

结果:

dataobject timinig:
CPU times: user 804 ms, sys: 16 ms, total: 820 ms
Wall time: 819 ms
__slots__ timinig:
CPU times: user 5.54 s, sys: 23.9 ms, total: 5.56 s
Wall time: 5.56 s
size (dataobject): 400000000
size (__slots__):  560000000
71.42857142857143 %

1
投票

另一种选择是跳过

Edge
类并通过表或邻接矩阵实现边缘。

例如

A = create_adjacency_graph(ijw)  # Implement to return a IxJ (sparse?) matrix of weights
edge_a_weight = A[3, 56]
edge_b_weight = A[670, 1023]
# etc...

虽然这确实消除了一些灵活性,但创建和使用都应该非常快。

© www.soinside.com 2019 - 2024. All rights reserved.