我正在开发一个因为 OOM 杀手而不断死亡的程序。我希望在不进行重大重构的情况下快速减少内存使用量。我尝试将
__slots__
添加到最常见的类中,但我注意到腌制的尺寸变大了。这是为什么?
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
cases = [
Class(1, 2, 3),
ClassSlots(1, 2, 3),
[Class(1, 2, 3) for _ in range(1000)],
[ClassSlots(1, 2, 3) for _ in range(1000)]
]
for case in cases:
dump = pickle.dumps(case, protocol=5)
print(len(dump))
使用 Python 3.10 打印
59
67
22041
25046
所以,在 Python 3.11 上,让我们定义以下内容:
class Foo:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class Bar:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
现在,让我们看看:
>>> import pickle
>>> import pickletools
>>> len(pickle.dumps(Foo(1,2,3))), len(pickle.dumps(Bar(1,2,3)))
(57, 60)
因此,似乎存在三个字节的差异(当我们使类具有相同长度的名称时......占您最初看到的 8 个字节差异中的 5 个)
现在,让我们看看反汇编给我们展示了什么:
>>> pickletools.dis(pickle.dumps(Foo(1,2,3)))
0: \x80 PROTO 4
2: \x95 FRAME 46
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'Foo'
27: \x94 MEMOIZE (as 1)
28: \x93 STACK_GLOBAL
29: \x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: \x81 NEWOBJ
32: \x94 MEMOIZE (as 3)
33: } EMPTY_DICT
34: \x94 MEMOIZE (as 4)
35: ( MARK
36: \x8c SHORT_BINUNICODE 'a'
39: \x94 MEMOIZE (as 5)
40: K BININT1 1
42: \x8c SHORT_BINUNICODE 'b'
45: \x94 MEMOIZE (as 6)
46: K BININT1 2
48: \x8c SHORT_BINUNICODE 'c'
51: \x94 MEMOIZE (as 7)
52: K BININT1 3
54: u SETITEMS (MARK at 35)
55: b BUILD
56: . STOP
highest protocol among opcodes = 4
和:
>>> pickletools.dis(pickle.dumps(Bar(1,2,3)))
0: \x80 PROTO 4
2: \x95 FRAME 49
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'Bar'
27: \x94 MEMOIZE (as 1)
28: \x93 STACK_GLOBAL
29: \x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: \x81 NEWOBJ
32: \x94 MEMOIZE (as 3)
33: N NONE
34: } EMPTY_DICT
35: \x94 MEMOIZE (as 4)
36: ( MARK
37: \x8c SHORT_BINUNICODE 'a'
40: \x94 MEMOIZE (as 5)
41: K BININT1 1
43: \x8c SHORT_BINUNICODE 'b'
46: \x94 MEMOIZE (as 6)
47: K BININT1 2
49: \x8c SHORT_BINUNICODE 'c'
52: \x94 MEMOIZE (as 7)
53: K BININT1 3
55: u SETITEMS (MARK at 36)
56: \x86 TUPLE2
57: \x94 MEMOIZE (as 8)
58: b BUILD
59: . STOP
highest protocol among opcodes = 4
所以,第一个区别是在操作码 33 上,非时隙类缺少一个
None
,即:
33: } EMPTY_DICT
34: \x94 MEMOIZE (as 4)
VS:
33: N NONE
34: } EMPTY_DICT
35: \x94 MEMOIZE (as 4)
其余指令构建相同的字典,但开槽版本也这样做:
56: \x86 TUPLE2
57: \x94 MEMOIZE (as 8)
创建一个元组
(None, {<the dict>})
我几乎可以肯定这与
__getstate__
的结果之间的差异有关:
>>> Foo(1,2,3).__getstate__()
{'a': 1, 'b': 2, 'c': 3}
>>> Bar(1,2,3).__getstate__()
(None, {'a': 1, 'b': 2, 'c': 3})
这种行为是 在
object.__getstate__
的 pickle 文档中描述的:
对于一个有实例
但没有__dict__
的类, 默认状态是__slots__
.self.__dict__
...
对于具有
且没有实例__slots__
的类,默认 state 是一个元组,它的第一项是__dict__
,第二项是a 将插槽名称映射到前面所述的插槽值的字典 子弹.None
5 个字节的差异只是因为您在类名中添加了“Slots”,并且必须将其嵌入到 pickle 中才能查找该类。
另外3个字节其实是因为slots。通常,默认的
__getstate__
只返回一个对象的__dict__
,或者如果字典为空或不存在则返回None
,并且pickle
在新对象的字典中设置条目,其中包含未腌制状态字典中的值unpickling 一个对象。这不适用于未存储在对象字典中的插槽。
当一个对象有槽时,默认的
__getstate__
会返回一个二元组。第一个元素是对象的 __dict__
,如果它有一个非空的 __dict__
- 在某些情况下,带槽的对象仍然可以有一个 __dict__
。第二个是将所有填充插槽的名称映射到它们的值的字典。第一个字典中的条目将直接在新对象的__dict__
中设置,而第二个字典中的条目将使用普通的属性设置操作设置。
当你 pickle
Class(1, 2, 3)
时,pickle
必须序列化一个 {'a': 1, 'b': 2, 'c': 3}
状态,而对于 ClassSlots(1, 2, 3)
,pickle
必须序列化一个 (None, {'a': 1, 'b': 2, 'c': 3})
状态元组。这意味着 pickle 包含一个额外的 NONE
操作码来加载 None
,一个额外的 TUPLE2
操作码来将 None
和字典打包成一个元组,以及一个额外的 MEMOIZE
操作码来将元组存储在 pickle 中备忘录(实际上是不必要的,因为没有任何东西从备忘录中加载元组,但 pickle 编译器默认情况下不会优化 pickles)。
您可以使用
pickletools.dis
看到 pickle 的反汇编,如果您想要更短的 pickle 以花费额外的时间优化它们为代价,您可以运行 pickletools.optimize
。 演示:
import pickle
import pickletools
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
pickletools.dis(pickle.dumps(Class(1, 2, 3)))
print('-------------')
pickletools.dis(pickle.dumps(ClassSlots(1, 2, 3)))
print('-------------')
pickletools.dis(pickletools.optimize(pickle.dumps(ClassSlots(1, 2, 3))))
输出:
0: \x80 PROTO 4
2: \x95 FRAME 48
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'Class'
29: \x94 MEMOIZE (as 1)
30: \x93 STACK_GLOBAL
31: \x94 MEMOIZE (as 2)
32: ) EMPTY_TUPLE
33: \x81 NEWOBJ
34: \x94 MEMOIZE (as 3)
35: } EMPTY_DICT
36: \x94 MEMOIZE (as 4)
37: ( MARK
38: \x8c SHORT_BINUNICODE 'a'
41: \x94 MEMOIZE (as 5)
42: K BININT1 1
44: \x8c SHORT_BINUNICODE 'b'
47: \x94 MEMOIZE (as 6)
48: K BININT1 2
50: \x8c SHORT_BINUNICODE 'c'
53: \x94 MEMOIZE (as 7)
54: K BININT1 3
56: u SETITEMS (MARK at 37)
57: b BUILD
58: . STOP
highest protocol among opcodes = 4
-------------
0: \x80 PROTO 4
2: \x95 FRAME 56
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'ClassSlots'
34: \x94 MEMOIZE (as 1)
35: \x93 STACK_GLOBAL
36: \x94 MEMOIZE (as 2)
37: ) EMPTY_TUPLE
38: \x81 NEWOBJ
39: \x94 MEMOIZE (as 3)
40: N NONE
41: } EMPTY_DICT
42: \x94 MEMOIZE (as 4)
43: ( MARK
44: \x8c SHORT_BINUNICODE 'a'
47: \x94 MEMOIZE (as 5)
48: K BININT1 1
50: \x8c SHORT_BINUNICODE 'b'
53: \x94 MEMOIZE (as 6)
54: K BININT1 2
56: \x8c SHORT_BINUNICODE 'c'
59: \x94 MEMOIZE (as 7)
60: K BININT1 3
62: u SETITEMS (MARK at 43)
63: \x86 TUPLE2
64: \x94 MEMOIZE (as 8)
65: b BUILD
66: . STOP
highest protocol among opcodes = 4
-------------
0: \x80 PROTO 4
2: \x95 FRAME 47
11: \x8c SHORT_BINUNICODE '__main__'
21: \x8c SHORT_BINUNICODE 'ClassSlots'
33: \x93 STACK_GLOBAL
34: ) EMPTY_TUPLE
35: \x81 NEWOBJ
36: N NONE
37: } EMPTY_DICT
38: ( MARK
39: \x8c SHORT_BINUNICODE 'a'
42: K BININT1 1
44: \x8c SHORT_BINUNICODE 'b'
47: K BININT1 2
49: \x8c SHORT_BINUNICODE 'c'
52: K BININT1 3
54: u SETITEMS (MARK at 38)
55: \x86 TUPLE2
56: b BUILD
57: . STOP
highest protocol among opcodes = 4