从源代码编译 TensorFlow 2.15.0

Question

我正在尝试从源代码（Ubuntu 22.04）构建具有 GPU 支持的 Tensorflow 2.15.0。我见过的所有文档都说应该使用 CUDA 12.2。但除非我安装了 TensorRT，否则构建会失败。很好 - 但 TensorRT 不支持 CUDA 12.2（我什至无法安装 TensorRT，除非我有 CUDA <= 12.1).

我在这里缺少什么？

详情：

为了从源代码进行编译，我遵循了以下步骤：

使用标准 NVIDIA 指令安装 CUDA 12.2（根据文档/发行说明）。
使用标准 NVIDIA 指令安装 cuDNN 8.8（根据文档/发行说明）。
安装 clang 17（根据文档/发行说明）。
克隆张量流存储库；结账2.15.0.
我运行配置脚本如下：

    You have bazel 6.1.0 installed.
    Please specify the location of python. [Default is /home/christopher/Desktop/code/tf-source/venv/bin/python3]: 
    
    
    Found possible Python library paths:
      /home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages
    Please input the desired Python library path to use.  Default is [/home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages]
    
    Do you wish to build TensorFlow with ROCm support? [y/N]: 
    No ROCm support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with CUDA support? [y/N]: y
    CUDA support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with TensorRT support? [y/N]: 
    No TensorRT support will be enabled for TensorFlow.
    
    Found CUDA 12.2 in:
        /usr/local/cuda-12.2/targets/x86_64-linux/lib
        /usr/local/cuda-12.2/targets/x86_64-linux/include
    Found cuDNN 8 in:
        /usr/lib/x86_64-linux-gnu
        /usr/include
    
    
    Please specify a list of comma-separated CUDA compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
    Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 8.9]: 8.0
    
    
    Do you want to use clang as CUDA compiler? [Y/n]: 
    Clang will be used as CUDA compiler.
    
    Please specify clang path that to be used as host compiler. [Default is /usr/lib/llvm-17/bin/clang]: 
    
    
    You have Clang 17.0.6 installed.
    
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 
    
    
    Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
    Not configuring the WORKSPACE for Android builds.
    
    Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
        --config=v1             # Build with TensorFlow 1 API instead of TF 2 API.
    Preconfigured Bazel build configs to DISABLE default on features:
        --config=nogcp          # Disable GCP support.
        --config=nonccl         # Disable NVIDIA NCCL support.
    Configuration finished

当我编译时使用：

bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package

我看到这样的错误：

ERROR: /home/christopher/Desktop/code/tf-source/tensorflow/WORKSPACE:84:14: fetching tensorrt_configure rule //external:local_config_tensorrt: Traceback (most recent call last):
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 300, column 38, in _tensorrt_configure_impl
        _create_local_tensorrt_repository(repository_ctx)
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 159, column 30, in _create_local_tensorrt_repository
        config = find_cuda_config(repository_ctx, ["cuda", "tensorrt"])
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/gpus/cuda_configure.bzl", line 649, column 26, in find_cuda_config
        exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
        fail(
Error in fail: Repository command failed
Could not find any NvInferVersion.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
        'targets/x86_64-linux/include'
of:
        '/lib'
        '/lib/i386-linux-gnu'
        '/lib/x86_64-linux-gnu'
        '/lib32'
        '/usr'
        '/usr/lib/x86_64-linux-gnu/libfakeroot'
        '/usr/lib32'
        '/usr/local/cuda'
        '/usr/local/cuda/targets/x86_64-linux/lib'

我相信丢失的标头属于 TensorRT。所以我尝试使用NVIDIA的文档来安装TensorRT。但最新版本不支持 CUDA 12.2，仅支持 <= 12.1. Obviously, I have tried installing 12.1 and then I can get quite deep into the compilation; however the official release is built using CUDA 12.2, so I'm stumped at the moment.

Answer 1

必须安装两个库 - libnvinfer-dev 和 libnvinfer-plugin-dev。对我来说，情况如下：

sudo apt-get install -y libnvinfer-dev=8.6.1.6-1+cuda12.0 libnvinfer-plugin-dev=8.6.1.6-1+cuda12.0

它们与 TensorRT 一起安装，但也可以独立安装。

从源代码编译 TensorFlow 2.15.0

问题描述投票：0回答：1

1个回答

最新问题

从源代码编译 TensorFlow 2.15.0

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1