使用 Twilio 媒体流,我想转录拨出呼叫。为了转录实时音频,使用了 Deepgram 转录 API。我很好奇 Twilio 流返回的音频类型以及 Deepgram 转录 api 所需的音频类型。
我将 Twilio 返回流解码并转换为波形文件,然后将其发送到 Deepgram API。但是,Deepgram 的 API 返回一个带有错误的 JSON 对象。
// deepgram websocket connection initiated
deepgram= new WebSocket('wss://api.deepgram.com/v1/listen', {
headers: {
Authorization: `Token c5a8a4337xxxxxxxxxx38e56456a52557a5`,
},
});
// Condition true when message from the twilio is media
if (msg.event === 'media') {
if (deepgram.readyState == WebSocket.OPEN) {
const twilioData = msg.media.payload;
// Build the wav file from scratch since it comes in as raw data
let wav = new WaveFile();
// Twilio uses MuLaw so we have to encode for that
wav.fromScratch(1, 8000, '8m', Buffer.from(twilioData, 'base64'));
// This library has a handy method to decode MuLaw straight to 16-bit PCM
wav.fromMuLaw();
// Get the raw audio data in base64
const twilio64Encoded = wav.toDataURI().split('base64,')[1];
// Create our audio buffer
const twilioAudioBuffer = Buffer.from(twilio64Encoded, 'base64');
// Send data starting at byte 44 to remove wav headers so our model sees only audio data
chunks.push(twilioAudioBuffer.slice(44));
// We have to chunk data b/c twilio sends audio durations of ~20ms and AAI needs a min of 100ms
const audioBuffer = Buffer.concat(chunks);
const encodedAudio = audioBuffer.toString('base64');
deepgram.send(encodedAudio);
}
}
从 Deepgram API 收到响应
{
type: 'Error',
variant: 'SchemaError',
description: 'Could not deserialize last text message: expected value at line 1 column 1',
message: 'KAAoACgAGADo/9j/yP/I/9j/+P8YADgASAA4ABgA+P/I/7j/uP/I/+j/GABIAFgASAAoAOj/uP+o/6j/yP/o/ygAWABoAFgAGADY/6j/iP+I/7j/CABIAHgAeABYACgA2P+o/4j/mP+4/wgASACEAIQAaAAYALj/bP9M/2z/uP8oAHgApACUAGgA+P+Y/1z/TP9s/7j/OACEAMQAxACEABgAmP88/wz/PP+o/zgApADkANQAeADo/2z/HP8c/2z/2P94ANQA5AC0AEgAqP8s//z+HP98/xgAtAAUASQB1ABIAIj/DP/M/uz+fP8oANQANAE0AcQAKABc/9z+rP78/oj/SADkACQBBAGEANj/PP/c/uz+TP/Y/5QABAEkAeQAWACo...
}
您收到此错误的原因是 DeepGram 的
send()
API 接受字符串 |缓冲。 DeepGram 接受的字符串格式为 {type:...}
。例如{type: "KeepAlive"}
,但是您正在向其发送 Base64 音频数据。实际上,您可以直接从 Twilio steam 有效负载将 Buffer 传递到 send()
API,而无需将其转换为 Wav。
见下图:
if (msg.event === 'media') {
if (deepgram.readyState == WebSocket.OPEN) {
const twilioData = msg.media.payload;
deepgram.send(Buffer.from(twilioData,"base64"));
}
}
确保将 DeepGram 实例配置为接受 8000Hz 的 Mulaw 数据:代码取自 DeepGram 网站:https://developers.deepgram.com/docs/getting-started-with-live-streaming-audio
// Initialize the Deepgram SDK
const deepgram = new Deepgram(deepgramApiKey);
// Create a websocket connection to Deepgram
// In this example, punctuation is turned on, interim results are turned off, and language is set to UK English.
const deepgramLive = deepgram.transcription.live({
punctuate: true,
interim_results: false,
language: "en-US",
model: "nova",
encoding: "mulaw", <- IMPORTANT: set encoding
sample_rate: 8000 <- IMPORTANT: set sample rate
});
向 deepgramLive 实例添加一个侦听器,您就步入正轨了。所有代码都可以通过链接轻松访问。
您正在发送一个 Base64 编码的字符串,您需要对其进行解码并发送音频字节字符串或字节。
你可以这样做:
media = msg['media']
audio = base64.b64decode(media['payload'])
然后将音频发送到 deepgram。