我有一个 Flask 应用程序,它将音频文件作为表单数据发布,我们希望使用 Azure Speech SDK 处理这些音频文件,以从语音中提取文本。
但是为了提高性能,我想处理音频文件而不将它们写入服务器的磁盘。
但是 Azure Speech SDK 似乎只能正确使用文件名。 我无法将文件作为 AudioInputStream 传递。
有人可以帮我处理该文件而不将其保存到磁盘吗?
def process_audio_files():
file = request.files['audio-file']
stream = file.stream
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
stream = speechsdk.audio.PushAudioInputStream(stream_format= stream)
#How to pass the file in the AudioConfig as parameter without saving to the disk?
audio_config = speechsdk.audio.AudioConfig(stream=stream)
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "fr-FR", "es-ES"])
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config, audio_config=audio_config)
我尝试使用以下 Flask 应用程序将语音转换为文本,而无需使用 Python 中的 Azure 语音 SDK 保存到磁盘。
代码:
app.py:
from flask import Flask, render_template, request
from io import BytesIO
import azure.cognitiveservices.speech as speechsdk
app = Flask(__name__)
speech_key = '<speech_key>'
service_region = '<speech_region>'
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
try:
file = request.files['audio-file']
stream = file.stream
stream.seek(0)
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_stream = speechsdk.audio.PushAudioInputStream(stream_format=speechsdk.audio.AudioStreamFormat())
audio_stream.write(stream.read())
stream.seek(0)
audio_stream.close()
stream.truncate(0)
audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
languages=["en-US", "fr-FR", "es-ES"])
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,
auto_detect_source_language_config=auto_detect_source_language_config,
audio_config=audio_config)
result = speech_recognizer.recognize_once()
return render_template('result.html', text=result.text)
except Exception as e:
return str(e)
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True)
templates/index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Speech to Text</title>
</head>
<body>
<h1>Upload Audio File</h1>
<form action="/" method="post" enctype="multipart/form-data">
<input type="file" name="audio-file" accept="audio/*" required>
<input type="submit" value="Transcribe">
</form>
</body>
</html>
templates/result.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Transcription Result</title>
</head>
<body>
<h1>Transcription Result</h1>
<p>{{ text }}</p>
</body>
</html>
输出:
以下Flask应用程序运行成功,如下图。
我在浏览器上收到了输出,如下所示。然后,我选择了音频.wav文件将语音转换为文本,如下所示。
语音已成功转换为文本,无需保存到磁盘,如下所示。