现在我正在准备一份关于线程中同步原语主题的报告,我试图找到一个很好的例子,当一个结果是通过 Lock() 阻塞获得的,而不使用它时完全不同。
在下面的示例中,我试图在多个线程的循环中一次将数字递增 1。我已经将迭代次数增加到 1000000 次,将线程数增加到 1000 次,但不想出现竞争条件(或其他任何情况)的影响。结果仍然严格等于迭代次数和线程数的乘积(在Ubuntu-20.04上运行)
from threading import Thread
COUNT = 1000000
NUM_THREADS = 1000
counter = 0
def increment():
global counter
for _ in range(COUNT):
counter += 1
threads = [Thread(target=increment) for _ in range(NUM_THREADS)]
[thread.start() for thread in threads]
[thread.join() for thread in threads]
diff = counter - COUNT * NUM_THREADS
print(f"Diff for counter without synchronization: {diff}")
任何人都可以提出一个示例(最好不是很复杂),其中不应用同步原语的多线程计算结果将不同于其“同步副本”吗?
如果你加入一些模拟工作,你应该会看到一些有趣的结果。
import threading
import time
import random
NUM_THREADS = 5
COUNT = 5
counter_nolock = 0
counter_lock = 0
lock = threading.Lock()
def increment_nolock():
global counter_nolock
for _ in range(COUNT):
prior = counter_nolock + 1
time.sleep(random.random())
counter_nolock = prior
print(f"nolock : {counter_nolock}")
def increment_lock():
global counter_lock
for _ in range(COUNT):
with lock:
prior = counter_lock + 1
time.sleep(random.random())
counter_lock = prior
print(f"lock : {counter_lock}")
def increment():
increment_nolock()
increment_lock()
if __name__ == '__main__':
threads = [
threading.Thread(target=increment)
for _ in range(NUM_THREADS)
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(f"increment_nolock: expected {COUNT * NUM_THREADS} got: {counter_nolock}")
print(f"increment_lock: expected {COUNT * NUM_THREADS} got: {counter_lock}")
这应该会给你一个半随机的结果,比如:
nolock : 1
nolock : 1
nolock : 2
nolock : 1
nolock : 1
nolock : 2
nolock : 3
nolock : 4
nolock : 1
nolock : 3
nolock : 3
nolock : 2
nolock : 4
nolock : 5
nolock : 5
nolock : 2
nolock : 4
nolock : 6
nolock : 3
lock : 1
nolock : 5
lock : 2
nolock : 3
lock : 3
nolock : 4
nolock : 4
nolock : 5
nolock : 6
lock : 4
lock : 5
lock : 6
lock : 7
lock : 8
lock : 9
lock : 10
lock : 11
lock : 12
lock : 13
lock : 14
lock : 15
lock : 16
lock : 17
lock : 18
lock : 19
lock : 20
lock : 21
lock : 22
lock : 23
lock : 24
lock : 25
increment_nolock: expected 25 got: 6
increment_lock: expected 25 got: 25
这就是
increment
函数中发生的事情:
>>> counter = 0
>>>
>>> def increment():
... global counter
... for _ in range(1000):
... counter += 1
...
>>> import dis
>>> dis.dis(increment)
3 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (1000)
4 CALL_FUNCTION 1
6 GET_ITER
>> 8 FOR_ITER 12 (to 22)
10 STORE_FAST 0 (_)
4 12 LOAD_GLOBAL 1 (counter)
14 LOAD_CONST 2 (1)
16 INPLACE_ADD
18 STORE_GLOBAL 1 (counter)
20 JUMP_ABSOLUTE 8
>> 22 LOAD_CONST 0 (None)
24 RETURN_VALUE
为了实现你想看到的,Python需要在指令12(
LOAD_GLOBAL
)和指令18(STORE_GLOBAL
)之后进行线程切换——当然,另一个线程必须修改计数器,而它有GIL .
您可以从
sys.getswitchinterval()
获得 Python 线程切换的频率 - 在我的系统上它是 5 毫秒。在正确的指令之间恰好命中切换间隔的机会不是零,所以只要有足够的时间就会发生。减少线程数量和增加迭代次数可能会提高您的机会,...或增加加载/存储之间的指令数量(即做更多的工作)。
您看到的是不同步访问的两个问题,即它在很多时候都能正常工作,并且很难重现问题。