一个原子变量(在这种情况下为128位结构)正在更新,这令仅有的一个能够对其进行更新的线程感到惊讶。怎么样?
这是一个最小的示例,因此它没有做任何有意义的事情,但是:alloc()函数返回一个malloc的缓冲区100次,然后分配一个新的缓冲区,它将返回100次,依此类推,甚至面对被多个线程调用的情况。
我有一个原子变量,它是一个带有指针,32位int和另一个32位计数器的结构,旨在避免ABA问题。
我有一个包含两个部分的函数。第一部分,如果返回计数为非零,将CAS结构减小返回计数(并增加ABA计数器),然后返回指针。否则,第二部分将获得一个互斥体,为新的指针分配内存,并且CAS将使用新的指针,一个新的非零返回计数器以及一个ABA计数器的增量完全使用小结构。
简而言之,当计数器大于零时,每个线程都可以更新此结构。但是,一旦它为零,第一个获取互斥锁的线程我认为
将是唯一可以再次CAS更新此结构的线程。除非有时此CAS失败!我的问题是“它怎么会失败”。
这里是一个正在运行的示例。可以使用g++ lockchange.cxx -o lockchange -latomic -pthread
进行编译。它在Fedora 31的gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)
上运行。
#include <algorithm>
#include <atomic>
#include <chrono>
#include <cassert>
#include <cstring>
#include <mutex>
#include <thread>
#include <vector>
using namespace std;
struct MyPair { /* Hungarian: pair */
char* pc; /* a buffer to be used n times */
int32_t iRemaining; /* number of times left to use pc */
uint32_t iUpdates; /* to avoid ABA problem */
};
const int iThreads{ 200 };
const int iThreadIterations{ 1000000 };
const int iSizeItem{ 128 };
mutex mux;
atomic<MyPair> pairNext;
char* alloc() {
TRY_AGAIN:
MyPair pairCur = pairNext.load();
// CASE 1: We can use the existing buffer?
while ( pairCur.iRemaining ) {
char* pcRV = pairCur.pc;
MyPair pairNew = { pairCur.pc,
pairCur.iRemaining - 1,
pairCur.iUpdates + 1 };
if ( pairNext.compare_exchange_weak( pairCur, pairNew ) )
return pcRV;
// Otherwise, pairNext was changed out from under us and pairCur
// will have been updated. Try again, as long as iRemaining
// non-zero.
}
// CASE 2: We've used pc as many times as allowed, so allocate a new pc.
// Get a mutex as we'll be changing too many fields to do atomically.
lock_guard<mutex> guard( mux );
// If multiple threads saw iRemaining = 0, they all will
// have tried for the mutex; only one will have gotten it, so
// there's a good chance that by the time we get the mutex, a
// sibling thread will have allocated a new pc and placed it at
// pairNext, so we don't need to allocate after all.
if ( pairNext.load().iRemaining ) // <=============================== it's as if this line isn't seeing the update made by the line below in real time.
goto TRY_AGAIN;
// Get a new buffer.
char* pcNew = (char*) malloc( iSizeItem );
MyPair pairNew = { pcNew, 100, pairCur.iUpdates + 1 };
if ( pairNext.compare_exchange_strong( pairCur, pairNew ) ) { //<===== the update that's not being seen above in real time
// *** other stuff with pcNew that needs mutex protection ***;
return pcNew;
} else {
// CASE 2c: after allocating a new page, we find that
// another thread has beaten us to it. I CAN'T FIGURE OUT
// HOW THAT'S POSSIBLE THOUGH. Our response should be safe
// enough: put our allocation back, and start all over again
// because who knows what else we missed. I see this error
// like 813 times out of 40 BILLION allocations in the
// hammer test, ranging from 1 to 200 threads.
printf( "unexpected: had lock but pairNext changed when iRemaining=0\n" );
// In fact the following free and goto should and seem to
// recover fine, but to be clear my question is how we can
// possibly end up here in the first place.
abort();
free( pcNew );
goto TRY_AGAIN;
}
}
void Test( int iThreadNumber ) {
for ( int i = 0; i < iThreadIterations; i++ )
alloc();
}
int main( int nArg, char* apszArg[] ) {
vector<thread> athr;
for ( int i = 0; i < iThreads; i++ )
athr.emplace_back( Test, i );
for ( auto& thr: athr )
thr.join();
}
一个原子变量(在这种情况下为128位结构)正在更新,这令唯一具有更新能力的线程感到惊讶。为何如此?这是一个最小的示例,所以它不起作用...
该问题被称为“ ABA问题”,我可以总结为检查无锁多线程编码中的变量,并认为它没有改变,但是已经改变了。
if ( pairNext.load().iUpdates != pairCur.iUpdates ) // <=============================== it's as if this line isn't seeing the update made by the line below in real time.
会解锁互斥锁,因为您要跳回到构造goto TRY_AGAIN;
之前。通常,人们将lock_guard<mutex>
放在示波器的顶部,并把锁的位置放在顶部,以使情况更清楚(并控制何时发生解锁)。我没有检查ISO C ++规则以查看是否需要这样做,但是至少在G ++和clang ++实现它的方式上,{}
确实可以解锁。 (将RAII锁定与goto
混合似乎是较差的设计)。