QThreadPool中的QVector :: clear和QFile :: close

问题描述 投票:2回答:1

我正在使用QThreadPool运行一个具有创建然后清除巨大的QVector并写入巨大文件大小的功能的工作程序。但是,每当一个工作线程到达该行(QVector :: clear / QFile :: close)时,所有线程都将冻结,并在完成后继续运行。

有人对克服这种情况有什么建议吗?为了使其他线程仍能够在工作程序之一中运行时正常运行。对于QFile :: close,我尝试在我的迭代中使用QFile :: flush而不是在迭代结束时使用close(),但这对性能没有帮助。

这是清除向量时线程变慢的代码

main.cpp

#include "mainwindow.h"
#include <QApplication>

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);
    MainWindow w;
    w.show();

    return a.exec();
}

mainwindow.h

#ifndef MAINWINDOW_H
#define MAINWINDOW_H

#include <QMainWindow>

namespace Ui {
class MainWindow;
}

class MainWindow : public QMainWindow
{
    Q_OBJECT

public:
    explicit MainWindow(QWidget *parent = nullptr);
    ~MainWindow();

private slots:
    void on_start_pushButton_clicked();

private:
    Ui::MainWindow *ui;
};

#endif // MAINWINDOW_H

mainwindow.cpp

#include "mainwindow.h"
#include "ui_mainwindow.h"
#include "worker.h"

#include <QDebug>
#include <QSharedPointer>
#include <QThread>
#include <QThreadPool>

MainWindow::MainWindow(QWidget *parent) :
    QMainWindow(parent),
    ui(new Ui::MainWindow)
{
    ui->setupUi(this);

    on_start_pushButton_clicked();
}

MainWindow::~MainWindow()
{
    delete ui;
}

void MainWindow::on_start_pushButton_clicked()
{
    int numProcess = 20;
    int numTraces = 10000;
    int numSamps = 8680;

    qDebug() << "main" << QThread::currentThread();
    QThreadPool *pool = QThreadPool::globalInstance();

    for (int i=0; i<numProcess; i++) {
        worker *w= new worker;
        w->setAutoDelete(true);

        w->setData(i+1, numTraces, numSamps);

        pool->start(w);
    }
}

mainwindow.ui

<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
 <class>MainWindow</class>
 <widget class="QMainWindow" name="MainWindow">
  <property name="geometry">
   <rect>
    <x>0</x>
    <y>0</y>
    <width>400</width>
    <height>300</height>
   </rect>
  </property>
  <property name="windowTitle">
   <string>MainWindow</string>
  </property>
  <widget class="QWidget" name="centralWidget">
   <widget class="QPushButton" name="start_pushButton">
    <property name="geometry">
     <rect>
      <x>240</x>
      <y>50</y>
      <width>75</width>
      <height>23</height>
     </rect>
    </property>
    <property name="text">
     <string>Start</string>
    </property>
   </widget>
  </widget>
 </widget>
 <layoutdefault spacing="6" margin="11"/>
 <resources/>
 <connections/>
</ui>

worker.h

#ifndef WORKER_H
#define WORKER_H

#include <QObject>
#include <QRunnable>
#include <QThread>

class worker : public QObject, public QRunnable
{
    Q_OBJECT
public:
    explicit worker(QObject *parent = nullptr) : QObject(parent), QRunnable () {}
    ~worker() {}

    void setData(int id, int numTraces, int numSamps);

    void run();

signals:

public slots:

private:
    void clearVector();

    int id, numTraces, numSamps;
};

#endif // WORKER_H

worker.cpp

#include "worker.h"

#include <QCoreApplication>
#include <QDebug>
#include <QVector>

void worker::setData(int id1, int numTraces, int numSamps)
{
    this->id = id1;
    this->numTraces = numTraces;
    this->numSamps = numSamps;

    qDebug() << "setData" << id << numTraces << numSamps;
}

void worker::run()
{
    clearVector();
    qDebug() << "pool finished" << id << numTraces << numSamps << QThread::currentThread();
}

void worker::clearVector()
{
    QVector<QVector<float>> traces1, traces2;

    float progressWaypoint = 0.01f*numTraces;
    int progressPos = 0;
    for (int i=0; i<numTraces; i++) {
        QVector<float> trace1, trace2;
        for (int j=0; j<numSamps; j++) {
            trace1.append(float(j));
            trace2.append(float(numSamps - j));
        }
        traces1.append(trace1);
        traces2.append(trace2);

        if (numTraces <= 100) {
            QCoreApplication::processEvents();
        }
        else {
            if (i + 1 >= round(progressWaypoint*progressPos)) {
                QCoreApplication::processEvents();
                qDebug() << id << QThread::currentThread() << progressPos;
                progressPos++;
            }
        }
    }

    traces1.clear();
    traces2.clear();
}
qt qt5 qthread qfile qvector
1个回答
2
投票

有趣的问题。在Windows上进行测试,Qt 5.12.4。

到目前为止,我已经确定的一件事是std::vector在这种情况下似乎表现更好。但这仍然很长一段时间,并且确实会影响系统上的其他线程,从而使UI只是有些响应。但比QVector好。

此外,这些数字很大,需要大量内存。在我的32位MinGw构建上,当我尝试使用> 2个线程时,它因内存不足错误而崩溃。因此,测试是使用64b MSVC2017完成的。测试机有8核@ 3。 GHz,带64GB RAM。

以下是一些计时结果(用于生成此结果的代码如下:]

1 worker with 2 `std::vector`s:
    Worker 1 finished (ms) 1648
    Last worker finished after 1649 total ms.

5 workers with 2 `std::vector`s:
    Worker 1 finished (ms) 44363
    Worker 2 finished (ms) 44386
    Worker 3 finished (ms) 44388
    Worker 4 finished (ms) 44401
    Worker 5 finished (ms) 44448
    Last worker finished after 44449 total ms.

10 workers with 2 `std::vector`s:
    Worker 4 finished (ms) 84910
    Worker 7 finished (ms) 92701
    Worker 2 finished (ms) 111590
    Worker 8 finished (ms) 144678
    Worker 9 finished (ms) 145378
    Worker 5 finished (ms) 169067
    Worker 3 finished (ms) 211629
    Worker 1 finished (ms) 220098
    Worker 10 finished (ms) 249356
    Worker 6 finished (ms) 253452
    Last worker finished after 253453 total ms.


1 worker with 2 `QVector`s:
    Worker 1 finished (ms) 1871
    Last worker finished after 1872 total ms.

5 workers with 2 `QVector`s:
    Worker 1 finished (ms) 36492
    Worker 3 finished (ms) 58157
    Worker 5 finished (ms) 79132
    Worker 2 finished (ms) 84612
    Worker 4 finished (ms) 84819
    Last worker finished after 84820 total ms.

10 workers with 2 `QVector`s:
    Worker 7 finished (ms) 234770
    Worker 8 finished (ms) 247531
    Worker 9 finished (ms) 261346
    Worker 1 finished (ms) 261924
    Worker 4 finished (ms) 270520
    Worker 2 finished (ms) 275740
    Worker 10 finished (ms) 290605
    Worker 3 finished (ms) 293575
    Worker 6 finished (ms) 296074
    Worker 5 finished (ms) 296249
    Last worker finished after 296361 total ms.

在5到10个线程之间的某个点,甚至std::vector似乎也开始“绊倒自己”。这在GUI响应能力中也很明显(在5时有些响应,在10时几乎没有响应)。

如OP的评论中所述,延迟发生在大向量traces1traces2的取消分配期间,而不是显然发生在clear()(或该问题的swap())期间。但是,确定此错误的唯一方法是使用调试器,因为一旦调试器到达clearVector()函数的末尾,该线程实际上就被挂断了(尝试使用计时器对此进行时间戳是没有用的)。

我还尝试在Worker内部仅使用1个矢量“设置”(请参见代码)。巨大的差异:

10 workers with 1 `std::vector`:
    Worker 5 finished (ms) 4125
    Worker 4 finished (ms) 4139
    Worker 1 finished (ms) 4141
    Worker 6 finished (ms) 4153
    Worker 10 finished (ms) 4161
    Worker 9 finished (ms) 4177
    Worker 7 finished (ms) 4197
    Worker 3 finished (ms) 4216
    Worker 8 finished (ms) 4209
    Worker 2 finished (ms) 4221
    Last worker finished after 4222 total ms.

10 workers with 1 `QVector`:
    Worker 10 finished (ms) 4308
    Worker 2 finished (ms) 4358
    Worker 1 finished (ms) 4373
    Worker 3 finished (ms) 4385
    Worker 8 finished (ms) 4391
    Worker 4 finished (ms) 4400
    Worker 6 finished (ms) 4404
    Worker 7 finished (ms) 4401
    Worker 5 finished (ms) 4409
    Worker 9 finished (ms) 4406
    Last worker finished after 4410 total ms.

这是我的测试“装备”:


#include <QRunnable>
#include <QThread>
#include <QElapsedTimer>
#include <QtWidgets>

#define USE_QVECTOR 0
#define NUM_VECTORS 2
#define USE_CLEAR   0
#define USE_SWAP    0

class Worker : public QObject, public QRunnable
{
    Q_OBJECT
  public:
#if USE_QVECTOR
    typedef QVector<int> vect_t;
    typedef QVector<vect_t> vectVect_t;
#else
    typedef std::vector<int> vect_t;
    typedef std::vector<vect_t> vectVect_t;
#endif

    explicit Worker(int id, int traces, int samples, QObject *parent = nullptr) :
      QObject(parent), QRunnable(),
      id(id), numTraces(traces), numSamps(samples)
    {}

    void run() override
    {
      qDebug() << "worker starting" << id << numTraces << numSamps << QThread::currentThread();
      emit progressChanged(id, -1);
      tim.start();
      clearVector();
      emit progressChanged(id, tim.elapsed());
    }

  signals:
    void progressChanged(int id, int pos) const;

  private:
    void clearVector()
    {
      vectVect_t traces1, traces2;
      traces1.reserve(numTraces);
      if (NUM_VECTORS > 1)
        traces2.reserve(numTraces);
      float progressWaypoint = 0.01f * numTraces;
      int progressPos = 0;
      for (int i=0; i < numTraces; i++) {
        vect_t trace1, trace2;
        trace1.reserve(numSamps);
        if (NUM_VECTORS > 1)
          trace2.reserve(numSamps);
        for (int j=0; j < numSamps; j++) {
          trace1.push_back(j);
          if (NUM_VECTORS > 1)
            trace2.push_back(numSamps - j);
        }
        traces1.push_back(trace1);
        if (NUM_VECTORS > 1)
          traces2.push_back(trace2);

        if (i + 1 >= round(progressWaypoint * progressPos))
          emit progressChanged(id, progressPos++);
      }
      qDebug() << "Vectors populated in" << tim.elapsed();

      if (USE_CLEAR) {
        // Clearing the vectors slows the process down a bit but its not where the delay is.
        traces1.clear();
        if (NUM_VECTORS > 1)
          traces2.clear();
      }
      if (USE_SWAP) {
        // swap is very fast but it doesn't help overall performance
        vectVect_t blank;
        traces1.swap(blank);
        if (NUM_VECTORS > 1)
          traces2.swap(blank);
      }
    }

    int id, numTraces, numSamps;
    QElapsedTimer tim;
};


int main(int argc, char *argv[]) {
  QApplication a(argc, argv);
  // UI setup
  QDialog d;
  d.setLayout(new QVBoxLayout());
  QPushButton *pbStart = new QPushButton("Start", &d);
  QSpinBox *sbThreads = new QSpinBox(&d);
  sbThreads->setValue(5);
  QSpinBox *sbTraces = new QSpinBox(&d);
  sbTraces->setMaximum(10000);
  sbTraces->setValue(10000);
  QSpinBox *sbSamps = new QSpinBox(&d);
  sbSamps->setMaximum(10000);
  sbSamps->setValue(8680);
  QHBoxLayout *btnLo = new QHBoxLayout();
  btnLo->setSpacing(6);
  btnLo->addWidget(pbStart);
  btnLo->addWidget(new QLabel("Thrds:", &d));
  btnLo->addWidget(sbThreads, 1);
  btnLo->addWidget(new QLabel("Traces:", &d));
  btnLo->addWidget(sbTraces, 1);
  btnLo->addWidget(new QLabel("Samps:", &d));
  btnLo->addWidget(sbSamps, 1);
  d.layout()->addItem(btnLo);
  // Text box for showing results
  QTextEdit *e = new QTextEdit(&d);
  e->setReadOnly(true);
  e->setTextInteractionFlags(Qt::TextBrowserInteraction);
  d.layout()->addWidget(e);

  QElapsedTimer tim;  // total elapsed timer
  QVector<int> finished;  // keep track of finished workers

  // Set up workers on button click.
  QObject::connect(pbStart, &QPushButton::clicked, &d, [&]()
  {
    const int threads = sbThreads->value(),
        traces = sbTraces->value(),
        samples = sbSamps->value();

    QThreadPool *pool = QThreadPool::globalInstance();
    //pool->setStackSize(samples * 4 * traces * threads);
    qDebug() << "Pool max. threads:" << pool->maxThreadCount() << "Stack size:" << pool->stackSize();
    pbStart->setDisabled(true);
    finished.clear();
    tim.start();

    for (int i=0; i < threads; i++) {
      Worker *w = new Worker(i+1, traces, samples);

      // Show messages on worker progress updates
      QObject::connect(w, &Worker::progressChanged, &d, [e, pbStart, threads, &tim, &finished](int id, int pos)
      {
        const QString msg = QStringLiteral("Worker %1 %2 %3")
            .arg(id)
            .arg(pos < 0 ? "started" : pos > 100 ? "finished (ms)" : "progress")
            .arg(pos);
        e->append(msg);
        if (pos > 100) {
          finished << id;
          if (finished.count() == threads) {
            e->append(QStringLiteral("Last worker finished after %1 total ms.").arg(tim.elapsed()));
            pbStart->setEnabled(true);
          }
        }
        e->ensureCursorVisible();
      }, Qt::QueuedConnection);

      w->setAutoDelete(true);
      pool->start(w);
      qDebug() << "Queued worker" << i+1 << "with active thread count:" << pool->activeThreadCount();
    }
  });

  d.show();
  return a.exec();
}

#include "main.moc"

enter image description here


ADDED:使用固定大小的数组而不是向量。显然,在实际代码中,需要采取一些措施以确保数组索引实际上是有效的。 (当然,也可以直接在内部循环中填充traces1traces2数组,而无需中间trace1/2,但现在使用NVM。)

    void clearVector()
    {
      float progressWaypoint = 0.01f * numTraces;
      int progressPos = 0;
      // volatile to help make sure the compiler isn't just optimizing these out.
      volatile int *traces1[10000], *traces2[10000];
      for (int i=0; i < numTraces; i++) {
        volatile int trace1[10000], trace2[10000];
        for (int j=0; j < numSamps; j++) {
          trace1[j] = j;
          trace2[j] = (numSamps - j);
        }
        traces1[i] = trace1;
        traces2[i] = trace2;
        if (i + 1 >= round(progressWaypoint * progressPos))
          emit progressChanged(id, progressPos++);
      }
      // also use a value from the populated arrays to make sure they really exist.
      qDebug() << "Vectors populated in" << tim.elapsed() << traces1[0][0] << traces2[5][5];
    }

我必须在计时器号上添加100,因为每个线程在<100毫秒内完成。

    void run() override {
      ...
      clearVector();
      emit progressChanged(id, tim.elapsed() + 100);
    }

[有20个线程(16个立即线程和4个队列),“跟踪”和“样本”各10K,我得到:

最后一个工作人员在332 ms之后完成。

而且这在我的20位线程的32位MinGW构建中也没有问题。相同的执行时间。

© www.soinside.com 2019 - 2024. All rights reserved.