运行时错误:未找到或无法加载库 libcublas.so.11

问题描述 投票:0回答:1

我正在使用 V100 GPU、高 RAM 模式在 google colab 上开发一个 LLM 项目,这些是我的依赖项:

git+https://github.com/pyannote/pyannote-audio
git+https://github.com/huggingface/[email protected]
openai==0.28
ffmpeg-python
pandas==1.5.0
tokenizers==0.14
torch==2.1.1
torchaudio==2.1.1
tqdm==4.64.1
EasyNMT==2.0.2
psutil==5.9.2
requests
pydub
docxtpl
faster-whisper==0.10.0
git+https://github.com/openai/whisper.git

这是我导入的所有内容:

from faster_whisper import WhisperModel
from datetime import datetime, timedelta
from time import time
from pathlib import Path
import pandas as pd
import os
from pydub import AudioSegment
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score

import requests

import torch
import pyannote.audio
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
from pyannote.audio import Audio
from pyannote.core import Segment

import wave
import contextlib
import psutil

import openai
from codecs import decode

from docxtpl import DocxTemplate

我曾经使用最新版本的 torch 和 torchaudio,但他们昨天得到了更新(2023 年 12 月 15 日,v2.1.2 发布)。我认为我遇到的错误是由更新引起的,所以我将它们固定到我的代码在 2 天前运行的版本 (v2.1.1)。显然,这不起作用。

两天前一切正常,我没有更改笔记本中的任何内容。唯一可能发生变化的是我正在使用的依赖项,但使用以前的版本并没有解决我的问题。这是引发错误的代码片段:

def EETDT(audio_path, whisper_model, num_speakers, output_name="diarization_result", selected_source_lang="eng", transcript=None):
    """
    Uses Whisper to seperate audio into segments and generate transcripts.
segment.

    Speech Recognition is based on models from OpenAI Whisper https://github.com/openai/whisper
    Speaker diarization model and pipeline from by https://github.com/pyannote/pyannote-audio

    audio_path : str -> path to wav file
    whisper_model : str -> small/medium/large/large-v2/large-v3
    num_speakers : int -> number of speakers in audio (0 to let the function determine it)
    output_name : str -> Desired name of the output file
    selected_source_lang : str -> language's code
    """

    audio_name = audio_path.split("/")[-1].split(".")[0]

    model = WhisperModel(whisper_model, compute_type="int8")
    time_start = time()
    if(audio_path == None):
        raise ValueError("Error no video input")
    print("Input file:", audio_path)
    if not audio_path.endswith(".wav"):
        print("Submitted audio isn't in wav format. Starting conversion...")
        audio = AudioSegment.from_file(audio_path)
        audio_suffix = audio_path.split(".")[-1]
        new_path = audio_path.replace(audio_suffix,"wav")
        audio.export(new_path, format="wav")
        audio_path = new_path
        print("Converted to wav:", new_path)
    try:
        # Get duration
        with contextlib.closing(wave.open(audio_path,'r')) as f:
            frames = f.getnframes()
            rate = f.getframerate()
            duration = frames / float(rate)
        if duration<30:
            raise ValueError(f"Audio has to be longer than 30 seconds. Current: {duration}")
        print(f"Duration of audio file: {duration}")

        # Transcribe audio
        options = dict(language=selected_source_lang, beam_size=5, best_of=5)
        transcribe_options = dict(task="transcribe", **options)
        segments_raw, info = model.transcribe(audio_path, **transcribe_options)

        # Convert back to original openai format
        segments = []
        i = 0
        full_transcript = list()
        if type(transcript) != type(pd.DataFrame()):
            for segment_chunk in segments_raw: # <-- THROWS ERROR
                chunk = {}
                chunk["start"] = segment_chunk.start
                chunk["end"] = segment_chunk.end
                chunk["text"] = segment_chunk.text
                full_transcript.append(segment_chunk.text)
                segments.append(chunk)
                i += 1
            full_transcript = "".join(full_transcript)
            print("Transcribe audio done with fast-whisper")
        else:
            for i in range(len(transcript)):
                full_transcript.append(transcript["text"].iloc[i])
            full_transcript = "".join(full_transcript)
            print("You inputted pre-transcribed audio")

    except Exception as e:
        raise RuntimeError("Error converting video to audio")
 ...The code never leaves the try block...

python pytorch google-colaboratory large-language-model openai-whisper
1个回答
0
投票

我今天在 Google Colab 上尝试使用 fast-whisper 时遇到了同样的问题。此自定义耳语实现仍需要使用 Cuda 11 作为要求,并且不适用于 Cuda 12。

我尝试查看 colab 实例的内部,它确实已切换到 cuda 12,这意味着 fast-whisper 无法运行,因为缺少依赖项。

如果您想尝试让它与 Cuda 12 一起工作,应该可以通过从源代码重建 CTranslate2 来实现,这里有一个关于此问题的参考问题: https://github.com/OpenNMT/CTranslate2/issues/1250

© www.soinside.com 2019 - 2024. All rights reserved.