为什么有槽的 pickled 对象比没有槽的大?

问题描述 投票:0回答:2

我正在开发一个因为 OOM 杀手而不断死亡的程序。我希望在不进行重大重构的情况下快速减少内存使用量。我尝试将

__slots__
添加到最常见的类中,但我注意到腌制的尺寸变大了。这是为什么?

class Class:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class ClassSlots:
    __slots__ = ["a", "b", "c"]

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

cases = [
    Class(1, 2, 3),
    ClassSlots(1, 2, 3),
    [Class(1, 2, 3) for _ in range(1000)],
    [ClassSlots(1, 2, 3) for _ in range(1000)]
]

for case in cases:
    dump = pickle.dumps(case, protocol=5)
    print(len(dump))

使用 Python 3.10 打印

59
67
22041
25046
python pickle
2个回答
2
投票

所以,在 Python 3.11 上,让我们定义以下内容:

class Foo:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class Bar:
    __slots__ = ["a", "b", "c"]
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

现在,让我们看看:

>>> import pickle
>>> import pickletools
>>> len(pickle.dumps(Foo(1,2,3))), len(pickle.dumps(Bar(1,2,3)))
(57, 60)

因此,似乎存在三个字节的差异(当我们使类具有相同长度的名称时......占您最初看到的 8 个字节差异中的 5 个)

现在,让我们看看反汇编给我们展示了什么:

>>> pickletools.dis(pickle.dumps(Foo(1,2,3)))
    0: \x80 PROTO      4
    2: \x95 FRAME      46
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'Foo'
   27: \x94 MEMOIZE    (as 1)
   28: \x93 STACK_GLOBAL
   29: \x94 MEMOIZE    (as 2)
   30: )    EMPTY_TUPLE
   31: \x81 NEWOBJ
   32: \x94 MEMOIZE    (as 3)
   33: }    EMPTY_DICT
   34: \x94 MEMOIZE    (as 4)
   35: (    MARK
   36: \x8c     SHORT_BINUNICODE 'a'
   39: \x94     MEMOIZE    (as 5)
   40: K        BININT1    1
   42: \x8c     SHORT_BINUNICODE 'b'
   45: \x94     MEMOIZE    (as 6)
   46: K        BININT1    2
   48: \x8c     SHORT_BINUNICODE 'c'
   51: \x94     MEMOIZE    (as 7)
   52: K        BININT1    3
   54: u        SETITEMS   (MARK at 35)
   55: b    BUILD
   56: .    STOP
highest protocol among opcodes = 4

和:

>>> pickletools.dis(pickle.dumps(Bar(1,2,3)))
    0: \x80 PROTO      4
    2: \x95 FRAME      49
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'Bar'
   27: \x94 MEMOIZE    (as 1)
   28: \x93 STACK_GLOBAL
   29: \x94 MEMOIZE    (as 2)
   30: )    EMPTY_TUPLE
   31: \x81 NEWOBJ
   32: \x94 MEMOIZE    (as 3)
   33: N    NONE
   34: }    EMPTY_DICT
   35: \x94 MEMOIZE    (as 4)
   36: (    MARK
   37: \x8c     SHORT_BINUNICODE 'a'
   40: \x94     MEMOIZE    (as 5)
   41: K        BININT1    1
   43: \x8c     SHORT_BINUNICODE 'b'
   46: \x94     MEMOIZE    (as 6)
   47: K        BININT1    2
   49: \x8c     SHORT_BINUNICODE 'c'
   52: \x94     MEMOIZE    (as 7)
   53: K        BININT1    3
   55: u        SETITEMS   (MARK at 36)
   56: \x86 TUPLE2
   57: \x94 MEMOIZE    (as 8)
   58: b    BUILD
   59: .    STOP
highest protocol among opcodes = 4

所以,第一个区别是在操作码 33 上,非时隙类缺少一个

None
,即:

33: }    EMPTY_DICT
34: \x94 MEMOIZE    (as 4)

VS:

33: N    NONE
34: }    EMPTY_DICT
35: \x94 MEMOIZE    (as 4)

其余指令构建相同的字典,但开槽版本也这样做:

56: \x86 TUPLE2
57: \x94 MEMOIZE    (as 8)

创建一个元组

(None, {<the dict>})

我几乎可以肯定这与

__getstate__
的结果之间的差异有关:

>>> Foo(1,2,3).__getstate__()
{'a': 1, 'b': 2, 'c': 3}
>>> Bar(1,2,3).__getstate__()
(None, {'a': 1, 'b': 2, 'c': 3})

这种行为是

object.__getstate__
的 pickle 文档中描述的:

对于一个有实例

__dict__
但没有
__slots__
的类, 默认状态是
self.__dict__
.

...

对于具有

__slots__
且没有实例
__dict__
的类,默认 state 是一个元组,它的第一项是
None
,第二项是a 将插槽名称映射到前面所述的插槽值的字典 子弹.


1
投票

5 个字节的差异只是因为您在类名中添加了“Slots”,并且必须将其嵌入到 pickle 中才能查找该类。

另外3个字节其实是因为slots。通常,默认的

__getstate__
只返回一个对象的
__dict__
,或者如果字典为空或不存在则返回
None
,并且
pickle
在新对象的字典中设置条目,其中包含未腌制状态字典中的值unpickling 一个对象。这不适用于未存储在对象字典中的插槽。

当一个对象有槽时,默认的

__getstate__
会返回一个二元组。第一个元素是对象的
__dict__
,如果它有一个非空的
__dict__
- 在某些情况下,带槽的对象仍然可以有一个
__dict__
。第二个是将所有填充插槽的名称映射到它们的值的字典。第一个字典中的条目将直接在新对象的
__dict__
中设置,而第二个字典中的条目将使用普通的属性设置操作设置

当你 pickle

Class(1, 2, 3)
时,
pickle
必须序列化一个
{'a': 1, 'b': 2, 'c': 3}
状态,而对于
ClassSlots(1, 2, 3)
pickle
必须序列化一个
(None, {'a': 1, 'b': 2, 'c': 3})
状态元组。这意味着 pickle 包含一个额外的
NONE
操作码来加载
None
,一个额外的
TUPLE2
操作码来将
None
和字典打包成一个元组,以及一个额外的
MEMOIZE
操作码来将元组存储在 pickle 中备忘录(实际上是不必要的,因为没有任何东西从备忘录中加载元组,但 pickle 编译器默认情况下不会优化 pickles)。

您可以使用

pickletools.dis
看到 pickle 的反汇编,如果您想要更短的 pickle 以花费额外的时间优化它们为代价,您可以运行
pickletools.optimize
演示

import pickle
import pickletools

class Class:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class ClassSlots:
    __slots__ = ["a", "b", "c"]

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

pickletools.dis(pickle.dumps(Class(1, 2, 3)))
print('-------------')
pickletools.dis(pickle.dumps(ClassSlots(1, 2, 3)))
print('-------------')
pickletools.dis(pickletools.optimize(pickle.dumps(ClassSlots(1, 2, 3))))

输出:

    0: \x80 PROTO      4
    2: \x95 FRAME      48
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'Class'
   29: \x94 MEMOIZE    (as 1)
   30: \x93 STACK_GLOBAL
   31: \x94 MEMOIZE    (as 2)
   32: )    EMPTY_TUPLE
   33: \x81 NEWOBJ
   34: \x94 MEMOIZE    (as 3)
   35: }    EMPTY_DICT
   36: \x94 MEMOIZE    (as 4)
   37: (    MARK
   38: \x8c     SHORT_BINUNICODE 'a'
   41: \x94     MEMOIZE    (as 5)
   42: K        BININT1    1
   44: \x8c     SHORT_BINUNICODE 'b'
   47: \x94     MEMOIZE    (as 6)
   48: K        BININT1    2
   50: \x8c     SHORT_BINUNICODE 'c'
   53: \x94     MEMOIZE    (as 7)
   54: K        BININT1    3
   56: u        SETITEMS   (MARK at 37)
   57: b    BUILD
   58: .    STOP
highest protocol among opcodes = 4
-------------
    0: \x80 PROTO      4
    2: \x95 FRAME      56
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'ClassSlots'
   34: \x94 MEMOIZE    (as 1)
   35: \x93 STACK_GLOBAL
   36: \x94 MEMOIZE    (as 2)
   37: )    EMPTY_TUPLE
   38: \x81 NEWOBJ
   39: \x94 MEMOIZE    (as 3)
   40: N    NONE
   41: }    EMPTY_DICT
   42: \x94 MEMOIZE    (as 4)
   43: (    MARK
   44: \x8c     SHORT_BINUNICODE 'a'
   47: \x94     MEMOIZE    (as 5)
   48: K        BININT1    1
   50: \x8c     SHORT_BINUNICODE 'b'
   53: \x94     MEMOIZE    (as 6)
   54: K        BININT1    2
   56: \x8c     SHORT_BINUNICODE 'c'
   59: \x94     MEMOIZE    (as 7)
   60: K        BININT1    3
   62: u        SETITEMS   (MARK at 43)
   63: \x86 TUPLE2
   64: \x94 MEMOIZE    (as 8)
   65: b    BUILD
   66: .    STOP
highest protocol among opcodes = 4
-------------
    0: \x80 PROTO      4
    2: \x95 FRAME      47
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x8c SHORT_BINUNICODE 'ClassSlots'
   33: \x93 STACK_GLOBAL
   34: )    EMPTY_TUPLE
   35: \x81 NEWOBJ
   36: N    NONE
   37: }    EMPTY_DICT
   38: (    MARK
   39: \x8c     SHORT_BINUNICODE 'a'
   42: K        BININT1    1
   44: \x8c     SHORT_BINUNICODE 'b'
   47: K        BININT1    2
   49: \x8c     SHORT_BINUNICODE 'c'
   52: K        BININT1    3
   54: u        SETITEMS   (MARK at 38)
   55: \x86 TUPLE2
   56: b    BUILD
   57: .    STOP
highest protocol among opcodes = 4
© www.soinside.com 2019 - 2024. All rights reserved.