尝试构建一个将实时语音转换为文本的应用程序

问题描述 投票:0回答:1

我一直在尝试使用 React js 作为前端、Python Flask 作为后端来创建实时语音文本。使用套接字在它们之间连接以发送实时数据我尝试了很多方法,但数据未正确转换或没有结果打印为输出。 python Flask以字节形式连续接收音频数据,并使用azure python voice sdk中的

pushAudioStream
创建AudioInputStream类的流,并给它配置azure语音python sdk/speechRecognizer的conversation_transcriber。但结果不太令人满意,请帮忙提供合适的解决方案

我需要输出为文本,我已在前端使用 React js 作为语音输入,并使用 Flask 作为后端

python reactjs azure speech-to-text azure-ai
1个回答
0
投票

尝试构建一个将实时语音转换为文本的应用程序

下面的代码适用于使用 React 作为前端、Flask 作为后端和 Socket 的语音转文本应用程序。 IO.

  • 此示例用于实现使用 Azure Speech to Text 转录音频。
from flask import Flask, render_template
from flask_socketio import SocketIO
from azure.cognitiveservices.speech import SpeechConfig, ResultReason
from azure.cognitiveservices.speech.audio import AudioConfig, AudioStreamFormat, PullAudioInputStreamCallback
import io
import numpy as np

app = Flask(__name__)
socketio = SocketIO(app)

# Set up your Speech Config
speech_config = SpeechConfig(subscription="AzureSpeechKey", region="AzureSpeechregion")

class StreamBuffer(PullAudioInputStreamCallback):
    def __init__(self, stream):
        super().__init__()
        self.stream = stream
        self.format = AudioStreamFormat(stream.sample_rate, stream.bits_per_sample, stream.channel_count)

    def read(self, buffer_size: int):
        data = self.stream.read(buffer_size)
        return data, len(data)

@socketio.on('audio')
def handle_audio(audio_data):
    audio_stream = io.BytesIO(audio_data)
    stream_buffer = StreamBuffer(audio_stream)

    # Configure your speech_recognizer
    speech_recognizer = speech_config.create_speech_recognizer()
    audio_config = AudioConfig(stream=stream_buffer)
    speech_recognizer.set_audio_config(audio_config)

    # Process audio stream
    result = speech_recognizer.recognize_once()

    # Emit the result back to the frontend
    if result.reason == ResultReason.RecognizedSpeech:
        socketio.emit('transcription', result.text)
    elif result.reason == ResultReason.NoMatch:
        socketio.emit('transcription', "No speech could be recognized")
    elif result.reason == ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        socketio.emit('transcription', "Speech Recognition canceled: {}".format(cancellation_details.reason))

if __name__ == '__main__':
    socketio.run(app, debug=True)


import  React,  {  useState  }  from  'react';
import  axios  from  'axios';
import  Dropzone  from  'react-dropzone';
import  './App.css';
 
const  App = ()  =>  {
const  [transcription,  setTranscription] = useState('');
const  [file,  setFile] = useState(null);
const  onDrop = (acceptedFiles)  =>  {
setFile(acceptedFiles[0]);
};
const  onTranscribe = async  ()  =>  {
const  formData = new  FormData();
formData.append('audio',  file);

try  {
const  response = await  axios.post('http://localhost:5000/api/transcribe',  formData,  {

headers:  {

'Content-Type':  'multipart/form-data',

},
});
setTranscription(response.data.transcription);
}  
catch (error) {

console.error('Error transcribing audio:',  error.message);
}
};

return (
<div  className="App">
<h1>Azure Speech to Text</h1>
<Dropzone  onDrop={onDrop}>

{({  getRootProps,  getInputProps  })  => (

<div  {...getRootProps()}  className="dropzone">

<input  {...getInputProps()}  />

<p>Drag & drop an audio file here, or click to select one</p>

</div>

)}

</Dropzone>

{file && <p>Selected File: {file.name}</p>}

<button  onClick={onTranscribe}  disabled={!file}>

Transcribe

</button>

{transcription && (

<div  className="transcription">

<h2>Transcription:</h2>

<p>{transcription}</p>

</div>

)}

</div>

);

};

export  default  App;

enter image description here

enter image description here

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.