我需要一些帮助。 我正在构建一个 Web 应用程序,它采用任何音频格式,转换为 .wav 文件,然后将其传递到“azure.cognitiveservices.speech”进行转录。我正在通过容器 Dockerfile 构建 Web 应用程序,因为我需要安装ffmpeg 能够将非“.wav”音频文件转换为“.wav”(因为 azure 语音服务仅处理 wav 文件)。由于某些奇怪的原因,当我在 Web 应用程序中安装 ffmpeg 时,“azure.cognitiveservices.speech”的“speechsdk”类无法工作。当我在没有 ffpmeg 的情况下安装它或在我的机器中构建并运行容器时,该类工作得非常好。
我已在代码中放置了调试打印语句。我可以看到该类正在启动,由于某种原因,当在我的机器上本地运行它时,它不会以相同的方式缓冲。例行公事就毫无理由地停止了。
有人遇到过 azure.cognitiveservices.speech 与 ffmpeg 冲突的类似问题吗?
这是我的 Dockerfile:
`# Use an official Python runtime as a parent imageFROM python:3.11-slim
#Version RunRUN echo "Version Run 1..."
Install ffmpeg
RUN apt-get update && apt-get install -y ffmpeg && # Ensure ffmpeg is executablechmod a+rx /usr/bin/ffmpeg && # Clean up the apt cache by removing /var/lib/apt/lists saves spaceapt-get clean && rm -rf /var/lib/apt/lists/*
Set the working directory in the container
WORKDIR /app
Copy the current directory contents into the container at /app
COPY . /app
Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
Make port 80 available to the world outside this container
EXPOSE 8000
Define environment variable
ENV NAME World
Run main.py when the container launches
CMD ["streamlit", "run", "main.py", "--server.port", "8000", "--server.address", "0.0.0.0"]`and here's my python code:
def transcribe_audio_continuous_old(temp_dir, audio_file, language):
speech_key = azure_speech_key
service_region = azure_speech_region
time.sleep(5)
print(f"DEBUG TIME BEFORE speechconfig")
ran = generate_random_string(length=5)
temp_file = f"transcript_key_{ran}.txt"
output_text_file = os.path.join(temp_dir, temp_file)
speech_recognition_language = set_language_to_speech_code(language)
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_config.speech_recognition_language = speech_recognition_language
audio_input = speechsdk.AudioConfig(filename=os.path.join(temp_dir, audio_file))
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input, language=speech_recognition_language)
done = False
transcript_contents = ""
time.sleep(5)
print(f"DEBUG TIME AFTER speechconfig")
print(f"DEBUG FIle about to be passed {audio_file}")
try:
with open(output_text_file, "w", encoding=encoding) as file:
def recognized_callback(evt):
print("Start continuous recognition callback.")
print(f"Recognized: {evt.result.text}")
file.write(evt.result.text + "\n")
nonlocal transcript_contents
transcript_contents += evt.result.text + "\n"
def stop_cb(evt):
print("Stopping continuous recognition callback.")
print(f"Event type: {evt}")
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
def canceled_cb(evt):
print(f"Recognition canceled: {evt.reason}")
if evt.reason == speechsdk.CancellationReason.Error:
print(f"Cancellation error: {evt.error_details}")
nonlocal done
done = True
speech_recognizer.recognized.connect(recognized_callback)
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(canceled_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(1)
print("DEBUG LOOPING TRANSCRIPT")
except Exception as e:
print(f"An error occurred: {e}")
print("DEBUG DONE TRANSCRIPT")
return temp_file, transcript_contents
`
此回调的记录在本地工作正常,或者在 Linux Web 应用程序中没有安装 ffmpeg 时工作正常。不知道为什么通过容器 dockerfile 安装时它与 ffmpeg 冲突。失败的代码部分可以在注释#NOTE DEBUG 中找到”
提供的 Dockerfile 和 Python 代码没有直接显示这些组件之间存在冲突的任何明确指示。
Dockerfile:
# Use an official Python runtime as a parent image
FROM python:3.8-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install ffmpeg
RUN apt-get update && apt-get install -y ffmpeg
# Install any needed dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Run app.py when the container launches
CMD ["python", "app.py"]
需求.txt:
Flask==2.0.2
azure-cognitiveservices-speech==1.23.0
app.py:
from flask import Flask, request, jsonify
import os
import subprocess
app = Flask(__name__)
# Path to store uploaded audio files
UPLOAD_FOLDER = 'uploads'
if not os.path.exists(UPLOAD_FOLDER):
os.makedirs(UPLOAD_FOLDER)
# Function to convert audio to WAV using ffmpeg
def convert_to_wav(input_file, output_file):
subprocess.run(['ffmpeg', '-i', input_file, '-ac', '1', '-ar', '16000', output_file])
# Route to handle file upload
@app.route('/upload', methods=['POST'])
def upload_file():
if 'file' not in request.files:
return jsonify({'error': 'No file part'})
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No selected file'})
if file:
filename = os.path.join(UPLOAD_FOLDER, file.filename)
file.save(filename)
wav_filename = os.path.splitext(filename)[0] + '.wav'
convert_to_wav(filename, wav_filename)
# Code to call Azure Cognitive Services Speech SDK for transcription
# Replace the below code with your actual Azure Speech SDK code
transcript = "This is a dummy transcript."
return jsonify({'transcript': transcript})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
azure_transcription.py:
app.py
调用此脚本import azure.cognitiveservices.speech as speechsdk
def transcribe_audio_wav(audio_file):
speech_key = "YourSpeechServiceKey"
service_region = "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_input = speechsdk.AudioConfig(filename=audio_file)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
result = speech_recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
return result.text
else:
return "Unable to recognize speech"