Python中的Google Speech API

问题描述 投票:0回答:1

我正在尝试在python中构建一个系统,通过套接字连接从客户端的浏览器传输音频,然后将音频流传输到谷歌云以进行语音识别。

这是客户端代码的代码:

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Audio Streaming</title>
</head>
<body>
    <script>
        var app = {
            socket: null,
            mediaTrack: null,
            counter: 0,
            bufferSize: 4096,
            config: null,
            main: function(){
                this.socket = new WebSocket("ws://127.0.0.1:5000");
                this.socket.addEventListener("open",this.onSocketOpen.bind(this));
                this.socket.addEventListener("message",this.onSocketMessage.bind(this));
            },
            onSocketOpen: function(event) {
                this.initRecorder();
                console.log("Socket Open");
            },
            onSocketMessage: function(event){
                    console.log(event.data)
            },
            shimAudioContext: function(){
                try{
                    window.AudioContext = window.AudioContext || window.webkitAudioContext;
                    navigator.getUserMedia = navigator.getUserMedia || 
                        navigator.webkitGetUserMedia ||
                        navigator.mozGetUserMedia ||
                        navigator.msGetUserMedia;
                }
                catch (e) {
                    alert ("Your browser is not supported");
                    return false;
                }
                if(!navigator.getUserMedia || !window.AudioContext){
                    alert("Your browser is not supported");
                    return false;
                }
                return true;
            },
            initRecorder: function(){
                if(!this.shimAudioContext){
                    return;
                }

                return navigator.mediaDevices.getUserMedia({ "audio": true,"video": false}).then((stream) => {

                    var context = new window.AudioContext();
                    //send metadata on audio stream to backend
                    this.sendContext(context.sampleRate);

                    // Caputure mic audio data into a stream
                    var audioInput = context.createMediaStreamSource(stream);
                    // only record mono audio w/a buffer of 2048 bits per function call
                    var recorder = context.createScriptProcessor(this.bufferSize, 1, 1);
                    // specify the processing function
                    recorder.onaudioprocess = this.audioProcess.bind(this);
                    // connect stream to our recorder
                    audioInput.connect(recorder);
                    // connect recorder to previous destination
                    recorder.connect(context.destination);
                    // store media track
                    this.mediaTrack = stream.getTracks()[0];
                    });
            },
            float32To16BitPCM: function(float32Arr) {
                var pcm16bit = new Int16Array(float32Arr.length);
                for(var i = 0; i < float32Arr.length; ++i) {
                // force number in [-1,1]
                var s = Math.max(-1, Math.min(1, float32Arr[i]));
                /**
                * convert 32 bit float to 16 bit int pcm audio
                * 0x8000 = minimum int16 value, 0x7fff = maximum int16 value
                */
                pcm16bit[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
                }
                return pcm16bit;
            },
            audioProcess: function(event) {
                // only 1 channel as specified above.....
                var float32Audio = event.inputBuffer.getChannelData(0) || new Flaot32Array(this.bufferSize);
                var pcm16Audio = this.float32To16BitPCM(float32Audio);
                this.socket.send(pcm16Audio.buffer);
            },
            sendContext: function(rate){
                this.config = {
                    rate : rate,
                    language : "en-US",
                    format : "Linear 16"
                }
                this.socket.send(JSON.stringify(this.config));
            }
        }

        // app.main()

    </script>

    <input type="button" value="On" onClick=app.main()>

</body>
</html>

我得到字节码的音频。

现在在我的python代码中,我不知道如何将其流式传输到Google Cloud进行语音识别。这是我的python代码:

import asyncio
import json
import io
import websockets
import threading
import queue



@asyncio.coroutine
def audioin(websocket,path):

    config = yield from websocket.recv()
    if not isinstance(config,str):
        print("Error, no conifg")
        yield from websocket.send(
            json.dumps({
                "error":"configuration not received as first message"
            })
        )

    config = json.loads(config)


    while True:
        data = yield from websocket.recv()


start_server = websockets.serve(audioin, "127.0.0.1", 5000)

asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()

如何将从WebSocket接收的数据流式传输到speech.SpeechClient()。streaming_reconize()???

sockets google-cloud-platform python-asyncio grpc google-speech-api
1个回答
0
投票

请看一下Python中流式语音识别的例子。

https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-recognize-python

您需要先配置SpeechClient。然后,您需要将websocket音频数据流打包到ProtoBuf消息中。

© www.soinside.com 2019 - 2024. All rights reserved.