使用 Twilio 和 Deepgram 进行实时电话通话转录

问题描述 投票:0回答:2

使用 Twilio 媒体流,我想转录拨出呼叫。为了转录实时音频,使用了 Deepgram 转录 API。我很好奇 Twilio 流返回的音频类型以及 Deepgram 转录 api 所需的音频类型。

我将 Twilio 返回流解码并转换为波形文件,然后将其发送到 Deepgram API。但是,Deepgram 的 API 返回一个带有错误的 JSON 对象。


// deepgram websocket connection initiated
 deepgram= new WebSocket('wss://api.deepgram.com/v1/listen', {
        headers: {
          Authorization: `Token c5a8a4337xxxxxxxxxx38e56456a52557a5`,
        },
      });

// Condition true when message from the twilio is media
 if (msg.event === 'media') {
      if (deepgram.readyState == WebSocket.OPEN) {
         const twilioData = msg.media.payload;
         // Build the wav file from scratch since it comes in as raw data
         let wav = new WaveFile();

         // Twilio uses MuLaw so we have to encode for that
         wav.fromScratch(1, 8000, '8m', Buffer.from(twilioData, 'base64'));

         // This library has a handy method to decode MuLaw straight to 16-bit PCM
         wav.fromMuLaw();

         // Get the raw audio data in base64
         const twilio64Encoded = wav.toDataURI().split('base64,')[1];

         // Create our audio buffer
         const twilioAudioBuffer = Buffer.from(twilio64Encoded, 'base64');

         // Send data starting at byte 44 to remove wav headers so our model sees only audio data
         chunks.push(twilioAudioBuffer.slice(44));

         // We have to chunk data b/c twilio sends audio durations of ~20ms and AAI needs a min of 100ms
          const audioBuffer = Buffer.concat(chunks);
          const encodedAudio = audioBuffer.toString('base64');
          deepgram.send(encodedAudio);
        
      }
    }

从 Deepgram API 收到响应

{
  type: 'Error',
  variant: 'SchemaError',
  description: 'Could not deserialize last text message: expected value at line 1 column 1',
  message: 'KAAoACgAGADo/9j/yP/I/9j/+P8YADgASAA4ABgA+P/I/7j/uP/I/+j/GABIAFgASAAoAOj/uP+o/6j/yP/o/ygAWABoAFgAGADY/6j/iP+I/7j/CABIAHgAeABYACgA2P+o/4j/mP+4/wgASACEAIQAaAAYALj/bP9M/2z/uP8oAHgApACUAGgA+P+Y/1z/TP9s/7j/OACEAMQAxACEABgAmP88/wz/PP+o/zgApADkANQAeADo/2z/HP8c/2z/2P94ANQA5AC0AEgAqP8s//z+HP98/xgAtAAUASQB1ABIAIj/DP/M/uz+fP8oANQANAE0AcQAKABc/9z+rP78/oj/SADkACQBBAGEANj/PP/c/uz+TP/Y/5QABAEkAeQAWACo...
}
real-time speech-to-text twilio-api ivr transcription
2个回答
1
投票

您收到此错误的原因是 DeepGram 的

send()
API 接受字符串 |缓冲。 DeepGram 接受的字符串格式为
{type:...}
。例如
{type: "KeepAlive"}
,但是您正在向其发送 Base64 音频数据。实际上,您可以直接从 Twilio steam 有效负载将 Buffer 传递到
send()
API,而无需将其转换为 Wav。

见下图:

 if (msg.event === 'media') {
  if (deepgram.readyState == WebSocket.OPEN) {
     const twilioData = msg.media.payload;
      deepgram.send(Buffer.from(twilioData,"base64"));
  }
}

确保将 DeepGram 实例配置为接受 8000Hz 的 Mulaw 数据:代码取自 DeepGram 网站:https://developers.deepgram.com/docs/getting-started-with-live-streaming-audio

// Initialize the Deepgram SDK
const deepgram = new Deepgram(deepgramApiKey);

// Create a websocket connection to Deepgram
// In this example, punctuation is turned on, interim results are turned off, and language is set to UK English.
const deepgramLive = deepgram.transcription.live({
    punctuate: true,
    interim_results: false,
    language: "en-US",
    model: "nova",
    encoding: "mulaw", <- IMPORTANT: set encoding
    sample_rate: 8000  <- IMPORTANT: set sample rate 
});

向 deepgramLive 实例添加一个侦听器,您就步入正轨了。所有代码都可以通过链接轻松访问。


0
投票

您正在发送一个 Base64 编码的字符串,您需要对其进行解码并发送音频字节字符串或字节。

你可以这样做:

media = msg['media']
            audio = base64.b64decode(media['payload'])

然后将音频发送到 deepgram。

© www.soinside.com 2019 - 2024. All rights reserved.