我做了一个测试实现,通过 websocket 连接接收音频流 比我缓冲并且总是在 2 秒后我想将音频保存到文件中。
对于运行良好的第一个块。但对于第二个和所有后续块,音频文件没有声音。我想这是因为我将音频流分成块的方式。但我不确定也许它是完全不同的东西。 这是我使用的测试代码:
import asyncio
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
app = FastAPI()
async def transcribe_chunk(audio_chunk, filename: str):
print("filename:", filename)
with open('audio_chunk' + filename + '.wav', 'wb') as f:
print("audio_chunk size:", len(audio_chunk))
f.write(audio_chunk)
@app.websocket("/audio")
async def websocket_endpoint(websocket: WebSocket, token: str):
await websocket.accept()
audio_buffer = bytearray()
chunk_size = 16000 * 2 # 2 seconds of audio in bytes
tasks = []
i = 0;
try:
while True:
data = await websocket.receive_bytes()
print("received data")
audio_buffer += data
# Check if the buffer has enough data to approximate 2 seconds of audio
if len(audio_buffer) >= chunk_size:
print("buffer is full")
i += 1
# Slice the buffer to get a 2-second chunk
audio_chunk = audio_buffer[:chunk_size]
print(len(audio_chunk))
# Process the 2-second audio chunk asynchronously
task = asyncio.create_task(transcribe_chunk(audio_chunk, token + str(i)))
tasks.append(task)
audio_buffer = audio_buffer[chunk_size:]
except WebSocketDisconnect:
print("Client disconnected")
if audio_buffer:
# Process any remaining audio data
task = asyncio.create_task(transcribe_chunk(audio_buffer))
tasks.append(task)
# Wait for all tasks to complete before ending the function
await asyncio.gather(*tasks)
我找到了自己问题的解决方案。但它不在 python 后端并产生一点传输开销。
我注意到的问题是由浏览器使用 webm 而不是 wav 进行音频录制引起的。
解决方案很简单,但有点愚蠢,在每个块之后,我现在停止并重新启动前端的媒体记录器。因此它发送完整的文件,而不仅仅是其中的块。