我从一个源(Twitch IRC)接收数据,该源未事先指定数据长度,并且它永远不会发送一致数量的数据。该来源使用“ " 作为它的分隔符,我想接收数据,直到找到这个分隔符,停止接收处理接收到的数据,然后继续接收。我尝试了一些我想出的解决方案:
delimiter = "\r\n"
buffer = ""
while True:
received = socket.recv(1).decode("utf-8", "ignore")
buffer += received
if buffer.endswith(delimiter):
process_data(buffer)
buffer = ""
此解决方案并不理想,因为当仅接收一个字节时,“已接收”通常是一个空字符串,这会在我的应用程序中触发错误处理程序(因为Python在recv()调用期间连接断开时不会引发异常,它只返回一个空字符串)。
delimiter = "\r\n"
buffer = ""
while True:
received = socket.recv(2048).decode("utf-8", "ignore")
received_messages = received.split(delimiter)
for i in received_messages[:-1]:
process_data(data)
这不是一个好的解决方案,因为 split() 删除了分隔符,这导致我无法知道列表中的最后一个元素是否是完整的消息。
在 Python TCP 套接字中找到分隔符之前接收数据的最佳方式是什么?我正在寻找的功能类似于 Boost 的 boost::asio::read_until().
找到分隔符后,您可以缓冲数据并提取整个消息。示例:
服务器.py
from socket import *
class Buffer:
def __init__(self,sock):
self.sock = sock
self.buffer = b''
def get_line(self):
while b'\r\n' not in self.buffer:
data = self.sock.recv(1024)
if not data: # socket closed
return None
self.buffer += data
line,sep,self.buffer = self.buffer.partition(b'\r\n')
return line.decode()
s = socket()
s.bind(('',5000))
s.listen()
while True:
c,a = s.accept()
with c:
print('Connected:',a)
b = Buffer(c)
while True:
line = b.get_line()
if line is None:
break
print('line:',line)
print('Disconnected:',a)
客户端.py
from socket import *
s = socket()
s.connect(('localhost',5000))
s.sendall(b'a partial')
s.sendall(b' line\r\nand another')
s.sendall(b' line\r\n')
s.close()
输出:
Connected: ('127.0.0.1', 59552)
line: a partial line
line: and another line
Disconnected: ('127.0.0.1', 59552)
以上也适用于替代分隔符。还有一个内置函数可以将套接字包装在类似文件的包装器中,当分隔符为换行符时可以使用
.readline()
:
socket.makefile
):
import socket
s = socket.socket()
s.bind(('', 5000))
s.listen()
while True:
c, a = s.accept()
with c, c.makefile('r', encoding='utf8') as infile:
print(f'{a}: connected')
while True:
line = infile.readline()
if not line:
break
print('line:', line.rstrip())
print(f'{a}: disconnected')