C++ 和 numpy 之间的 python 绑定中复杂的 C++ 生命周期问题

Question

我正在寻找有关如何处理 C++ 和 numpy / Python 之间复杂的生命周期问题的建议。抱歉，文字墙很长，但我想提供尽可能多的背景信息。

我开发了 cvnp，一个库，它提供

cv::Mat

和

py::array

对象之间的绑定之间的转换，以便在使用 pybind11 时在两者之间共享内存。它最初基于的 SO 答案丹·马塞克。一切进展顺利，该库已在多个项目中使用，其中包括 robotpy，它是 FIRST 机器人竞赛的 Python 库。

但是，用户提出了问题，该问题涉及链接的

cv::Mat

和

py::array

对象的生命周期。

在
```
cv::Mat
```
->
```
py::array
```
的方向上，一切都很好，因为 mat_to_nparray 将创建一个
```
py::array
```
，通过“胶囊”（Python 句柄）保留对链接的 cv::Mat 的引用。
但是，在
```
py::array
```
->
```
cv::Mat
```
、nparray_to_mat 方向上，cv::Mat 将访问 py::array 的数据，而不引用该数组（因此 py:: 的生命周期）数组不保证与 cv::Mat 相同）

参见 mat_to_nparray：

py::capsule make_capsule_mat(const cv::Mat& m)
{
    return py::capsule(new cv::Mat(m)
        , [](void *v) { delete reinterpret_cast<cv::Mat*>(v); }
    );
}

pybind11::array mat_to_nparray(const cv::Mat& m)
{
    return pybind11::array(detail::determine_np_dtype(m.depth())
        , detail::determine_shape(m)
        , detail::determine_strides(m)
        , m.data
        , detail::make_capsule_mat(m)
        );
}

和 nparray_to_mat：

cv::Mat nparray_to_mat(pybind11::array& a)
{
    ...
    cv::Mat m(size, type, is_not_empty ? a.mutable_data(0) : nullptr);
    return m;
}

到目前为止，这效果很好，直到用户写下以下内容：

一个绑定的 C++ 函数，返回作为参数传递的相同 cv::Mat

m.def("test", [](cv::Mat mat) { return mat; });

一些使用此函数的Python代码

img = np.zeros(shape=(480, 640, 3), dtype=np.uint8)
img = test(img)

在这种情况下，可能会发生分段错误，因为

py::array

对象在

cv::Mat

对象之前被销毁，并且

cv::Mat

对象尝试访问

py::array

对象的数据。然而，分段错误不是系统性的，并且取决于操作系统+ python 版本。

我能够使用 ASAN 通过 this commit 在 CI 中重现它。重现代码相当简单：

void test_lifetime()
{
    // We need to create a big array to trigger a segfault
    auto create_example_array = []() -> pybind11::array
    {
        constexpr int rows = 1000, cols = 1000;
        std::vector<pybind11::ssize_t> a_shape{rows, cols};
        std::vector<pybind11::ssize_t> a_strides{};
        pybind11::dtype a_dtype = pybind11::dtype(pybind11::format_descriptor<int32_t>::format());
        pybind11::array a(a_dtype, a_shape, a_strides);
        // Set initial values
        for(int i=0; i<rows; ++i)
            for(int j=0; j<cols; ++j)
                *((int32_t *)a.mutable_data(j, i)) = j * rows + i;

        printf("Created array data address =%p\n%s\n",
               a.data(),
               py::str(a).cast<std::string>().c_str());
        return a;
    };

    // Let's reimplement the bound version of the test function via pybind11:
    auto test_bound = [](pybind11::array& a) {
        cv::Mat m = cvnp::nparray_to_mat(a);
        return cvnp::mat_to_nparray(m);
    };

    // Now let's reimplement the failing python code in C++
    //    img = np.zeros(shape=(480, 640, 3), dtype=np.uint8)
    //    img = test(img)
    auto img = create_example_array();
    img = test_bound(img);

    // Let's try to change the content of the img array
    *((int32_t *)img.mutable_data(0, 0)) = 14;  // This triggers an error that ASAN catches
    printf("img data address =%p\n%s\n",
           img.data(),
           py::str(img).cast<std::string>().c_str());
}

我正在寻找有关如何处理此问题的建议。我看到几个选项：

一个理想的解决方案是

在

pybind11::array.inc_ref()

 内构建 cv::Mat 时调用

nparray_to_mat

确保在销毁此特定实例时调用
```
pybind11::array.dec_ref()
```
。但是，我不知道该怎么做。

注意：我知道 cv::Mat 可以使用自定义分配器，但在这里没有用，因为 cv::Mat 不会分配内存本身，而是使用 py::array 对象的内存。

感谢您阅读本文，并提前感谢您的任何建议！

Answer 1

SO 可能是我的橡皮鸭，因为我认为潜在的解决方案可能是确实为 cv::Mat 使用自定义分配器，除了它不会分配，而是重定向到 py::array 数据（并且也调用 inc_ref / dec_ref)。

C++ 和 numpy 之间的 python 绑定中复杂的 C++ 生命周期问题

问题描述投票：0回答：1

1个回答

最新问题

C++ 和 numpy 之间的 python 绑定中复杂的 C++ 生命周期问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1