XLA 需要 ptxas 版本 11.8 或更高版本

Question

我尝试在 WSL 上运行带有 TensorFlow 后端的教程 Keras 模型，但 model.fit 函数抛出错误。

除了 Nvidia 的 CUDA 开发套件外，我还通过 PyPi 安装了所有内容。我没有使用 Anaconda。

我正在运行这些规格：

Windows 11
GeForce RTX 3060 Ti
WSL-2
Python 3.10
Keras 3.1.1
TensorFlow[和 cuda] 2.16.1
CUDA 12.4
游戏就绪驱动程序 551.86
nvcc，ptxas 11.5

这是模型：

import numpy as np
import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

num_classes = 10
input_shape = (28, 28, 1)

model = keras.Sequential(
    [
        keras.layers.Input(shape=input_shape),
        keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        keras.layers.MaxPooling2D(pool_size=(2, 2)),
        keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        keras.layers.GlobalAveragePooling2D(),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(),
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    metrics=[
        keras.metrics.SparseCategoricalAccuracy(name="acc"),
    ],
)

batch_size = 128
epochs = 20

callbacks = [
    keras.callbacks.ModelCheckpoint(filepath="model_at_epoch_{epoch}.keras"),
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
]

model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.15,
    callbacks=callbacks,
)
score = model.evaluate(x_test, y_test, verbose=0)

我尝试使用

python3 src/model.py

运行此模型，但收到此输出/错误：

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711666829.169113   61802 service.cc:145] XLA service 0x7f0c2c004f30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1711666829.169157   61802 service.cc:153]   StreamExecutor device (0): NVIDIA GeForce RTX 3060 Ti, Compute Capability 8.6
2024-03-28 17:00:29.186928: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-28 17:00:29.273943: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907
2024-03-28 17:00:29.634895: F external/local_xla/xla/service/gpu/triton_autotuner.cc:634] Non-OK-status: has_executable.status() status: INTERNAL: XLA requires ptxas version 11.8 or higherFailure occured when compiling fusion triton_gemm_dot.1176 with config '{block_m:16,block_n:16,block_k:32,split_k:4,num_stages:1,num_warps:4}'
Fused HLO computation:
%triton_gemm_dot.1176_computation (parameter_0.1: f32[128,128], parameter_1.1: f32[128,10]) -> f32[128,10] {
  %parameter_0.1 = f32[128,128]{1,0} parameter(0)
  %constant.407 = f32[] constant(0.0078125), metadata={op_type="Mul" op_name="gradient_tape/compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul" source_file="/home/pippi/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py" source_line=1177}
  %broadcast.323 = f32[128,10]{1,0} broadcast(f32[] %constant.407), dimensions={}, metadata={op_type="Mul" op_name="gradient_tape/compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul" source_file="/home/pippi/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py" source_line=1177}
  %parameter_1.1 = f32[128,10]{1,0} parameter(1)
  %multiply.2190 = f32[128,10]{1,0} multiply(f32[128,10]{1,0} %broadcast.323, f32[128,10]{1,0} %parameter_1.1), metadata={op_type="Mul" op_name="gradient_tape/compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul" source_file="/home/pippi/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py" source_line=1177}
  ROOT %dot.1 = f32[128,10]{1,0} dot(f32[128,128]{1,0} %parameter_0.1, f32[128,10]{1,0} %multiply.2190), lhs_contracting_dims={0}, rhs_contracting_dims={0}, frontend_attributes={grad_x="false",grad_y="true"}, metadata={op_type="MatMul" op_name="gradient_tape/sequential_1/dense_1/MatMul/MatMul_1" source_file="/home/pippi/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py" source_line=1177}
}
Aborted

通过我的搜索努力，我相信

status: has_executable.status() status: INTERNAL: XLA requires ptxas version 11.8

是错误的主要来源。我的版本不正确吗？

Answer 1

通过评论文章，按照here找到的安装后说明使用正确的 nvcc 版本修复了问题。

XLA 需要 ptxas 版本 11.8 或更高版本

问题描述投票：0回答：1

1个回答

最新问题

XLA 需要 ptxas 版本 11.8 或更高版本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1