pthread_cond_destroy()挂起的奇怪行为

问题描述 投票:1回答:1

我知道pthread_cancel()很棘手。我问这个问题是为了解使用pthread_cancel()的软件中的错误。

我将问题简化为以下代码:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

static pthread_mutex_t notify_mutex;
static pthread_cond_t notify;

static void *_watcher_thread(void *arg)
{
    (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    (void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

    printf("watcher:   thread started\n");

    while (1) {
            if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) {
                    perror("failed to disable watcher thread cancel: ");
            }
            pthread_mutex_lock(&notify_mutex);
            pthread_cond_wait(&notify, &notify_mutex);
            pthread_mutex_unlock(&notify_mutex);
            (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    }
    return NULL;
}

static void *_timer_thread(void *args)
{
    (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    (void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

    printf("timer:   thread started\n");

    while (1) {
            if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) {
                    perror("failed to disable timer thread cancel: ");
            }
            pthread_mutex_lock(&notify_mutex); /* XXX: not a cancellation point */
            pthread_cond_signal(&notify);
            pthread_mutex_unlock(&notify_mutex);
            (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    }
    return NULL;
}

int main(void)
{
    pthread_t watcher_tid, timer_tid;
    pthread_attr_t attr;
    long i = 0;

    while (1) {
            pthread_cond_init(&notify, NULL);
            pthread_mutex_init(&notify_mutex, NULL);
            pthread_attr_init(&attr);

            if (pthread_create(&watcher_tid, &attr,
                               &_watcher_thread, NULL)) {
                    perror("failed to create watcher thread: ");
            }
            if (pthread_create(&timer_tid, &attr,
                               &_timer_thread, NULL)) {
                    perror("failed to create timer thread: ");
            }

            sleep(1);

            printf("main:   to cancel watcher thread\n");
            pthread_cancel(watcher_tid);
            pthread_join(watcher_tid, NULL);
            printf("main:   watcher thread canceled\n");

            printf("main:   to cancel timer thread\n");
            pthread_cancel(timer_tid);
            pthread_join(timer_tid, NULL);
            printf("main:   timer thread canceled\n");

            pthread_cond_destroy(&notify);
            pthread_mutex_destroy(&notify_mutex);
            pthread_attr_destroy(&attr);
            i ++;
            printf("iteration: %ld\n", i);
    }

    return 0;
}

基本上有三个线程:观察者,计时器和主要。计时器线程定期唤醒观察者线程以完成一些工作。最后主线程终止其他线程并退出。我在上面的测试程序中写了一些循环来重现问题。

在Linux中编译并运行程序(debian测试,4.9.0-3-amd64#1 SMP,glibc-2.24),它会在一些迭代后挂起:

...
main:   to cancel timer thread
main:   timer thread canceled
iteration: 4
timer:   thread started
watcher:   thread started
main:   to cancel watcher thread
main:   watcher thread canceled
main:   to cancel timer thread
main:   timer thread canceled
iteration: 5
timer:   thread started
watcher:   thread started
main:   to cancel watcher thread
main:   watcher thread canceled
main:   to cancel timer thread
main:   timer thread canceled

gdb显示挂起程序的堆栈跟踪:

(gdb) attach 29247
Attaching to process 29247
Reading symbols from /home/hjcao/temp/test/pthread/hang1...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f7960708eb5 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x0000561b1f194f01 in main () at hang1.c:78
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7f7960b12700 (LWP 29247) "hang1" 0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) 

=======================================================

我的问题是:我不明白为什么主线程会挂在pthread_cond_destroy()

实际上,原始程序(名称为hang0)在观察者/计时器线程中的while循环中没有pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL)pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL)调用。它将挂在主线程中,这是可以理解的:异步取消观察器/计时器线程可能导致线程在pthread_cond_wait() / pthread_cond_notify()执行期间被取消,并使条件变量notify在内部搞砸了。我添加了pthread_setcancelstate()调用,以防止在操作条件变量时取消观察者/计时器线程。但新程序(名称为hang1)仍然悬而未决。

有人可以帮我解释一下吗?

pthreads hang
1个回答
0
投票

我认为这个帖子可以提供帮助:pthread conditions and process termination(Gusev Petr的答案帮助我解决了我的问题)

我在pthread_cond_destroy()函数中遇到了相同的条件变量问题。

它主要是因为条件变量没有逻辑来确定它所等待的线程是否仍然在运行或死亡(通常是由于pthread_cancel())。因此,一种可能的解决方案是强制将变量中的值更改为0,如上面的链接所述。

© www.soinside.com 2019 - 2024. All rights reserved.