我一直在尝试使用 React js 作为前端、Python Flask 作为后端来创建实时语音文本。使用套接字在它们之间连接以发送实时数据我尝试了很多方法,但数据未正确转换或没有结果打印为输出。 python Flask以字节形式连续接收音频数据,并使用azure python voice sdk中的
pushAudioStream
创建AudioInputStream类的流,并给它配置azure语音python sdk/speechRecognizer的conversation_transcriber。但结果不太令人满意,请帮忙提供合适的解决方案
我需要输出为文本,我已在前端使用 React js 作为语音输入,并使用 Flask 作为后端
尝试构建一个将实时语音转换为文本的应用程序
下面的代码适用于使用 React 作为前端、Flask 作为后端和 Socket 的语音转文本应用程序。 IO.
from flask import Flask, render_template
from flask_socketio import SocketIO
from azure.cognitiveservices.speech import SpeechConfig, ResultReason
from azure.cognitiveservices.speech.audio import AudioConfig, AudioStreamFormat, PullAudioInputStreamCallback
import io
import numpy as np
app = Flask(__name__)
socketio = SocketIO(app)
# Set up your Speech Config
speech_config = SpeechConfig(subscription="AzureSpeechKey", region="AzureSpeechregion")
class StreamBuffer(PullAudioInputStreamCallback):
def __init__(self, stream):
super().__init__()
self.stream = stream
self.format = AudioStreamFormat(stream.sample_rate, stream.bits_per_sample, stream.channel_count)
def read(self, buffer_size: int):
data = self.stream.read(buffer_size)
return data, len(data)
@socketio.on('audio')
def handle_audio(audio_data):
audio_stream = io.BytesIO(audio_data)
stream_buffer = StreamBuffer(audio_stream)
# Configure your speech_recognizer
speech_recognizer = speech_config.create_speech_recognizer()
audio_config = AudioConfig(stream=stream_buffer)
speech_recognizer.set_audio_config(audio_config)
# Process audio stream
result = speech_recognizer.recognize_once()
# Emit the result back to the frontend
if result.reason == ResultReason.RecognizedSpeech:
socketio.emit('transcription', result.text)
elif result.reason == ResultReason.NoMatch:
socketio.emit('transcription', "No speech could be recognized")
elif result.reason == ResultReason.Canceled:
cancellation_details = result.cancellation_details
socketio.emit('transcription', "Speech Recognition canceled: {}".format(cancellation_details.reason))
if __name__ == '__main__':
socketio.run(app, debug=True)
import React, { useState } from 'react';
import axios from 'axios';
import Dropzone from 'react-dropzone';
import './App.css';
const App = () => {
const [transcription, setTranscription] = useState('');
const [file, setFile] = useState(null);
const onDrop = (acceptedFiles) => {
setFile(acceptedFiles[0]);
};
const onTranscribe = async () => {
const formData = new FormData();
formData.append('audio', file);
try {
const response = await axios.post('http://localhost:5000/api/transcribe', formData, {
headers: {
'Content-Type': 'multipart/form-data',
},
});
setTranscription(response.data.transcription);
}
catch (error) {
console.error('Error transcribing audio:', error.message);
}
};
return (
<div className="App">
<h1>Azure Speech to Text</h1>
<Dropzone onDrop={onDrop}>
{({ getRootProps, getInputProps }) => (
<div {...getRootProps()} className="dropzone">
<input {...getInputProps()} />
<p>Drag & drop an audio file here, or click to select one</p>
</div>
)}
</Dropzone>
{file && <p>Selected File: {file.name}</p>}
<button onClick={onTranscribe} disabled={!file}>
Transcribe
</button>
{transcription && (
<div className="transcription">
<h2>Transcription:</h2>
<p>{transcription}</p>
</div>
)}
</div>
);
};
export default App;