用
n = 10**8
调用,简单循环对我来说始终比复杂循环慢得多,我不明白为什么:
def simple(n):
while n:
n -= 1
def complex(n):
while True:
if not n:
break
n -= 1
有时几秒钟内:
simple 4.340795516967773
complex 3.6490490436553955
simple 4.374553918838501
complex 3.639145851135254
simple 4.336690425872803
complex 3.624480724334717
Python: 3.11.4 (main, Sep 9 2023, 15:09:21) [GCC 13.2.1 20230801]
这是字节码的循环部分,如
dis.dis(simple)
所示:
6 >> 6 LOAD_FAST 0 (n)
8 LOAD_CONST 1 (1)
10 BINARY_OP 23 (-=)
14 STORE_FAST 0 (n)
5 16 LOAD_FAST 0 (n)
18 POP_JUMP_BACKWARD_IF_TRUE 7 (to 6)
对于
complex
:
10 >> 4 LOAD_FAST 0 (n)
6 POP_JUMP_FORWARD_IF_TRUE 2 (to 12)
11 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
12 >> 12 LOAD_FAST 0 (n)
14 LOAD_CONST 2 (1)
16 BINARY_OP 23 (-=)
20 STORE_FAST 0 (n)
9 22 JUMP_BACKWARD 10 (to 4)
所以看起来复杂的每次迭代都会做更多的工作(两次跳转而不是一次)。那为什么会更快呢?
似乎是Python 3.11的现象,请参阅评论。
基准脚本(在线尝试!):
from time import time
import sys
def simple(n):
while n:
n -= 1
def complex(n):
while True:
if not n:
break
n -= 1
for f in [simple, complex] * 3:
t = time()
f(10**8)
print(f.__name__, time() - t)
print('Python:', sys.version)
我检查了字节码(python 3.11.6)的源代码,发现在反编译的字节码中,似乎只有
JUMP_BACKWARD
会执行预热函数,当执行足够多的次数时,会触发python 3.11中的specialization:
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)
{
/* ... */
TARGET(JUMP_BACKWARD) {
_PyCode_Warmup(frame->f_code);
JUMP_TO_INSTRUCTION(JUMP_BACKWARD_QUICK);
}
/* ... */
}
static inline void
_PyCode_Warmup(PyCodeObject *code)
{
if (code->co_warmup != 0) {
code->co_warmup++;
if (code->co_warmup == 0) {
_PyCode_Quicken(code);
}
}
}
专业化似乎可以加快使用多个字节码的速度,从而显着提高速度:
void
_PyCode_Quicken(PyCodeObject *code)
{
/* ... */
switch (opcode) {
case EXTENDED_ARG: /* ... */
case JUMP_BACKWARD: /* ... */
case RESUME: /* ... */
case LOAD_FAST: /* ... */
case STORE_FAST: /* ... */
case LOAD_CONST: /* ... */
}
/* ... */
}