我正在尝试在python中构建一个系统,通过套接字连接从客户端的浏览器传输音频,然后将音频流传输到谷歌云以进行语音识别。
这是客户端代码的代码:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Audio Streaming</title>
</head>
<body>
<script>
var app = {
socket: null,
mediaTrack: null,
counter: 0,
bufferSize: 4096,
config: null,
main: function(){
this.socket = new WebSocket("ws://127.0.0.1:5000");
this.socket.addEventListener("open",this.onSocketOpen.bind(this));
this.socket.addEventListener("message",this.onSocketMessage.bind(this));
},
onSocketOpen: function(event) {
this.initRecorder();
console.log("Socket Open");
},
onSocketMessage: function(event){
console.log(event.data)
},
shimAudioContext: function(){
try{
window.AudioContext = window.AudioContext || window.webkitAudioContext;
navigator.getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||
navigator.msGetUserMedia;
}
catch (e) {
alert ("Your browser is not supported");
return false;
}
if(!navigator.getUserMedia || !window.AudioContext){
alert("Your browser is not supported");
return false;
}
return true;
},
initRecorder: function(){
if(!this.shimAudioContext){
return;
}
return navigator.mediaDevices.getUserMedia({ "audio": true,"video": false}).then((stream) => {
var context = new window.AudioContext();
//send metadata on audio stream to backend
this.sendContext(context.sampleRate);
// Caputure mic audio data into a stream
var audioInput = context.createMediaStreamSource(stream);
// only record mono audio w/a buffer of 2048 bits per function call
var recorder = context.createScriptProcessor(this.bufferSize, 1, 1);
// specify the processing function
recorder.onaudioprocess = this.audioProcess.bind(this);
// connect stream to our recorder
audioInput.connect(recorder);
// connect recorder to previous destination
recorder.connect(context.destination);
// store media track
this.mediaTrack = stream.getTracks()[0];
});
},
float32To16BitPCM: function(float32Arr) {
var pcm16bit = new Int16Array(float32Arr.length);
for(var i = 0; i < float32Arr.length; ++i) {
// force number in [-1,1]
var s = Math.max(-1, Math.min(1, float32Arr[i]));
/**
* convert 32 bit float to 16 bit int pcm audio
* 0x8000 = minimum int16 value, 0x7fff = maximum int16 value
*/
pcm16bit[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}
return pcm16bit;
},
audioProcess: function(event) {
// only 1 channel as specified above.....
var float32Audio = event.inputBuffer.getChannelData(0) || new Flaot32Array(this.bufferSize);
var pcm16Audio = this.float32To16BitPCM(float32Audio);
this.socket.send(pcm16Audio.buffer);
},
sendContext: function(rate){
this.config = {
rate : rate,
language : "en-US",
format : "Linear 16"
}
this.socket.send(JSON.stringify(this.config));
}
}
// app.main()
</script>
<input type="button" value="On" onClick=app.main()>
</body>
</html>
我得到字节码的音频。
现在在我的python代码中,我不知道如何将其流式传输到Google Cloud进行语音识别。这是我的python代码:
import asyncio
import json
import io
import websockets
import threading
import queue
@asyncio.coroutine
def audioin(websocket,path):
config = yield from websocket.recv()
if not isinstance(config,str):
print("Error, no conifg")
yield from websocket.send(
json.dumps({
"error":"configuration not received as first message"
})
)
config = json.loads(config)
while True:
data = yield from websocket.recv()
start_server = websockets.serve(audioin, "127.0.0.1", 5000)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
如何将从WebSocket接收的数据流式传输到speech.SpeechClient()。streaming_reconize()???
请看一下Python中流式语音识别的例子。
https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-recognize-python
您需要先配置SpeechClient
。然后,您需要将websocket音频数据流打包到ProtoBuf消息中。