我有一个 python 程序,它从 Twilio 电话呼叫启动 websocket,并将该电话呼叫的音频保存到文件系统中的 WAV 文件中。有用!我的程序切换到 Websocket,并且我能够将音频缓冲到字节数组中并将该数组保存到 WAV 文件中。一切都很好,但是当我尝试播放该音频文件时,它非常静态,听起来质量非常低。我不确定这是否是网络套接字上音频流的实际质量,或者我在接收或保存音频时是否做错了什么。我将程序包含在这里。
# Program to accept a phone call via Twilio and save what the speaker says to a WAVE file
# Need to have ngrok, FastAPI and Twilio all setup properly
import os
import json
import base64
import wave
from fastapi import FastAPI, WebSocket, Request, WebSocketDisconnect
from fastapi.responses import HTMLResponse, Response
from jinja2 import Template
app = FastAPI()
# Global variables
wsserver = []
# Set the filename for writing the media stream
output_filename = "output.wav"
# Our buffer where we will queue up all the streamed audio
pcmu_data = bytearray()
# Default values, but will be overriden when we get our "start" message
channels = 1 # Mono
sample_width = 1 # 8-bit samples for PCMU
frame_rate = 8000 # Typical frame rate for PCMU
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
global wsserver
await websocket.accept()
wsserver.append(websocket)
while True:
message = await websocket.receive_text()
await on_message(websocket, message)
async def on_message(websocket, message):
global wsserver
global frame_rate
global channels
global pcmu_data
global sample_width
try:
msg = json.loads(message)
event = msg.get("event")
if event == "connected":
print("A new call has connected.")
elif event == "start":
print(f"Starting Media Stream {msg.get('streamSid')}")
print(msg)
# Override our default values with what our start message tells us
channels = msg['start']['mediaFormat']['channels']
frame_rate = msg['start']['mediaFormat']['sampleRate']
# The event that carries our audio stream
elif event == "media":
payload = msg['media']['payload']
if payload:
# Decode base64-encoded media data
media_bytes = base64.b64decode(payload)
# Add the data onto the end of our byte array
pcmu_data.extend(media_bytes)
elif event == "stop":
print("Call Has Ended")
# How long was our mulaw stream
frames = len(pcmu_data)
# Create a WAV file with an initial number of frames set to 0
with wave.open(output_filename, 'w') as wav_file:
wav_file.setnchannels(channels)
wav_file.setsampwidth(sample_width)
wav_file.setframerate(frame_rate)
wav_file.setnframes(frames)
wav_file.writeframes(pcmu_data)
except WebSocketDisconnect as e:
print("WebSocket disconnectedL {e}")
wsserver.remove(websocket)
@app.post("/")
async def post(request: Request):
host = request.client.host
print("Post call - host=" + host)
xml = Template("""
<Response>
<Start>
<Stream url="wss://83b4-73-70-107-57.ngrok-free.app/ws"/>
</Start>
<Say>Please state your message</Say>
<Pause length="60" />
</Response>
""").render(host=host)
return Response(content=xml, media_type="text/xml")
if __name__ == "__main__":
print("Listening at Port 8080")
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)
这成功工作了,但是保存到文件系统的音频流非常静态。
我从使用 wave 切换到使用 pywav 库。将代码编写改为以下内容,结果一目了然:
data_bytes = b"".join(pcmu_data)
wave_write = pywav.WavWrite(output_filename, 1, 8000, 8, 7) # 1 stands for mono channel, 8000 sample rate, 8 bit, 7 stands for MULAW encoding
wave_write.write(data_bytes)
wave_write.close()