处理来自字节流或文件的音频,而不保存到磁盘 Azure 语音 SDK Python

问题描述 投票:0回答:1

我有一个 Flask 应用程序,它将音频文件作为表单数据发布,我们希望使用 Azure Speech SDK 处理这些音频文件,以从语音中提取文本。

但是为了提高性能,我想处理音频文件而不将它们写入服务器的磁盘。

但是 Azure Speech SDK 似乎只能正确使用文件名。 我无法将文件作为 AudioInputStream 传递。

有人可以帮我处理该文件而不将其保存到磁盘吗?

def process_audio_files():
file = request.files['audio-file']
stream = file.stream

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

stream = speechsdk.audio.PushAudioInputStream(stream_format= stream)

#How to pass the file in the AudioConfig as parameter without saving to the disk?

audio_config = speechsdk.audio.AudioConfig(stream=stream)

auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "fr-FR", "es-ES"])


speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config, audio_config=audio_config)
azure azure-cognitive-services azure-speech azure-text-translation
1个回答
0
投票

我尝试使用以下 Flask 应用程序将语音转换为文本,而无需使用 Python 中的 Azure 语音 SDK 保存到磁盘。

代码:

app.py:

from flask import Flask, render_template, request
from io import BytesIO
import azure.cognitiveservices.speech as speechsdk
 
app = Flask(__name__)
 
speech_key = '<speech_key>'
service_region = '<speech_region>'
 
@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        try:
            file = request.files['audio-file']
            stream = file.stream
            stream.seek(0)
 
            speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
            audio_stream = speechsdk.audio.PushAudioInputStream(stream_format=speechsdk.audio.AudioStreamFormat())

            audio_stream.write(stream.read())
            stream.seek(0)
            audio_stream.close()
            stream.truncate(0)
 
            audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)
            auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
                languages=["en-US", "fr-FR", "es-ES"])
 
            speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,
                                                           auto_detect_source_language_config=auto_detect_source_language_config,
                                                           audio_config=audio_config)
 
            result = speech_recognizer.recognize_once()
 
            return render_template('result.html', text=result.text)
 
        except Exception as e:
            return str(e)  
 
    return render_template('index.html')
 
if __name__ == '__main__':
    app.run(debug=True)

templates/index.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech to Text</title>
</head>
<body>
    <h1>Upload Audio File</h1>
    <form action="/" method="post" enctype="multipart/form-data">
        <input type="file" name="audio-file" accept="audio/*" required>
        <input type="submit" value="Transcribe">
    </form>
</body>
</html>

templates/result.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Transcription Result</title>
</head>
<body>
    <h1>Transcription Result</h1>
    <p>{{ text }}</p>
</body>
</html>

输出:

以下Flask应用程序运行成功,如下图。

enter image description here

我在浏览器上收到了输出,如下所示。然后,我选择了音频.wav文件将语音转换为文本,如下所示。

enter image description here

语音已成功转换为文本,无需保存到磁盘,如下所示。

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.