在 Linux Web 应用程序中与 FFmpeg 一起安装时 Python 的 azure.cognitiveservices.speech 出现问题

问题描述 投票:0回答:1

我需要一些帮助。 我正在构建一个 Web 应用程序,它采用任何音频格式,转换为 .wav 文件,然后将其传递到“azure.cognitiveservices.speech”进行转录。我正在通过容器 Dockerfile 构建 Web 应用程序,因为我需要安装ffmpeg 能够将非“.wav”音频文件转换为“.wav”(因为 azure 语音服务仅处理 wav 文件)。由于某些奇怪的原因,当我在 Web 应用程序中安装 ffmpeg 时,“azure.cognitiveservices.speech”的“speechsdk”类无法工作。当我在没有 ffpmeg 的情况下安装它或在我的机器中构建并运行容器时,该类工作得非常好。

我已在代码中放置了调试打印语句。我可以看到该类正在启动,由于某种原因,当在我的机器上本地运行它时,它不会以相同的方式缓冲。例行公事就毫无理由地停止了。

有人遇到过 azure.cognitiveservices.speech 与 ffmpeg 冲突的类似问题吗?

这是我的 Dockerfile:

`# Use an official Python runtime as a parent imageFROM python:3.11-slim
#Version RunRUN echo "Version Run 1..."
Install ffmpeg
RUN apt-get update && apt-get install -y ffmpeg && # Ensure ffmpeg is executablechmod a+rx /usr/bin/ffmpeg && # Clean up the apt cache by removing /var/lib/apt/lists saves spaceapt-get clean && rm -rf /var/lib/apt/lists/*
Set the working directory in the container
WORKDIR /app
Copy the current directory contents into the container at /app
COPY . /app
Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
Make port 80 available to the world outside this container
EXPOSE 8000
Define environment variable
ENV NAME World
Run main.py when the container launches
CMD ["streamlit", "run", "main.py", "--server.port", "8000", "--server.address", "0.0.0.0"]`and here's my python code:
def transcribe_audio_continuous_old(temp_dir, audio_file, language):
    speech_key = azure_speech_key
    service_region = azure_speech_region

    time.sleep(5)
    print(f"DEBUG TIME BEFORE speechconfig")

    ran = generate_random_string(length=5)
    temp_file = f"transcript_key_{ran}.txt"
    output_text_file = os.path.join(temp_dir, temp_file)
    speech_recognition_language = set_language_to_speech_code(language)
    
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language = speech_recognition_language
    audio_input = speechsdk.AudioConfig(filename=os.path.join(temp_dir, audio_file))
        
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input, language=speech_recognition_language)
    done = False
    transcript_contents = ""

    time.sleep(5)
    print(f"DEBUG TIME AFTER speechconfig")
    print(f"DEBUG FIle about to be passed {audio_file}")

    try:
        with open(output_text_file, "w", encoding=encoding) as file:
            def recognized_callback(evt):
                print("Start continuous recognition callback.")
                print(f"Recognized: {evt.result.text}")
                file.write(evt.result.text + "\n")
                nonlocal transcript_contents
                transcript_contents += evt.result.text + "\n"

            def stop_cb(evt):
                print("Stopping continuous recognition callback.")
                print(f"Event type: {evt}")
                speech_recognizer.stop_continuous_recognition()
                nonlocal done
                done = True
            
            def canceled_cb(evt):
                print(f"Recognition canceled: {evt.reason}")
                if evt.reason == speechsdk.CancellationReason.Error:
                    print(f"Cancellation error: {evt.error_details}")
                nonlocal done
                done = True

            speech_recognizer.recognized.connect(recognized_callback)
            speech_recognizer.session_stopped.connect(stop_cb)
            speech_recognizer.canceled.connect(canceled_cb)

            speech_recognizer.start_continuous_recognition()
            while not done:
                time.sleep(1)
                print("DEBUG LOOPING TRANSCRIPT")

    except Exception as e:
        print(f"An error occurred: {e}")

    print("DEBUG DONE TRANSCRIPT")

    return temp_file, transcript_contents

`

此回调的记录在本地工作正常,或者在 Linux Web 应用程序中没有安装 ffmpeg 时工作正常。不知道为什么通过容器 dockerfile 安装时它与 ffmpeg 冲突。失败的代码部分可以在注释#NOTE DEBUG 中找到”

ffmpeg azure-cognitive-services transcription
1个回答
0
投票

提供的 Dockerfile 和 Python 代码没有直接显示这些组件之间存在冲突的任何明确指示。

  • 下面是如何构建一个 Web 应用程序的基本概述,该应用程序使用 Docker、Flask、ffmpeg 和 Azure 认知服务语音 SDK 来实现所描述的功能。

Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install ffmpeg
RUN apt-get update && apt-get install -y ffmpeg

# Install any needed dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Run app.py when the container launches
CMD ["python", "app.py"]

需求.txt:

Flask==2.0.2
azure-cognitiveservices-speech==1.23.0

app.py:

from flask import Flask, request, jsonify
import os
import subprocess

app = Flask(__name__)

# Path to store uploaded audio files
UPLOAD_FOLDER = 'uploads'
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

# Function to convert audio to WAV using ffmpeg
def convert_to_wav(input_file, output_file):
    subprocess.run(['ffmpeg', '-i', input_file, '-ac', '1', '-ar', '16000', output_file])

# Route to handle file upload
@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return jsonify({'error': 'No file part'})

    file = request.files['file']
    if file.filename == '':
        return jsonify({'error': 'No selected file'})

    if file:
        filename = os.path.join(UPLOAD_FOLDER, file.filename)
        file.save(filename)
        wav_filename = os.path.splitext(filename)[0] + '.wav'
        convert_to_wav(filename, wav_filename)
        # Code to call Azure Cognitive Services Speech SDK for transcription
        # Replace the below code with your actual Azure Speech SDK code
        transcript = "This is a dummy transcript."
        return jsonify({'transcript': transcript})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

azure_transcription.py:

  • 使用 Azure 认知服务语音 SDK 处理转录的单独脚本。将音频转换为 WAV 后,可以从
    app.py
    调用此脚本
import azure.cognitiveservices.speech as speechsdk

def transcribe_audio_wav(audio_file):
    speech_key = "YourSpeechServiceKey"
    service_region = "YourServiceRegion"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_input = speechsdk.AudioConfig(filename=audio_file)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
    result = speech_recognizer.recognize_once()
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text
    else:
        return "Unable to recognize speech"

© www.soinside.com 2019 - 2024. All rights reserved.