用于流式传输请求正文内容的Python服务器

Question

我正在尝试创建 python 智能代理服务器，它应该能够将大型请求正文内容从客户端流式传输到某些内部存储（可能是 amazon s3、swift、ftp 或类似的东西）。在流媒体服务器应该请求一些内部 API 服务器来确定上传到内部存储的参数之前。主要限制是它应该使用 PUT 方法在一次 HTTP 操作中完成。此外，它应该异步工作，因为会有大量文件上传。

什么解决方案允许我从上传内容中读取块，并在用户上传整个文件之前开始将这些块传输到内部存储？我知道的所有 Python Web 应用程序都会在对 wsgi 应用程序/Python Web 服务器进行管理之前收到等待完整内容的信息。

我找到的解决方案之一是tornado fork https://github.com/nephics/tornado。但它是非官方的，龙卷风开发人员并不急于将其纳入主分支。那么您可能知道我的问题的一些现有解决方案吗？龙卷风？扭曲？事件？

Answer 1

这是一个使用 Twisted 编写的进行流式上传处理的服务器示例：

from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString

from twisted.web.server import Request, Site
from twisted.web.resource import Resource

from twisted.application.service import Application
from twisted.application.internet import StreamServerEndpointService

# Define a Resource class that doesn't really care what requests are made of it.
# This simplifies things since it lets us mostly ignore Twisted Web's resource
# traversal features.
class StubResource(Resource):
    isLeaf = True

    def render(self, request):
        return b""

class StreamingRequestHandler(Request):
    def handleContentChunk(self, chunk):
        # `chunk` is part of the request body.
        # This method is called as the chunks are received.
        Request.handleContentChunk(self, chunk)
        # Unfortunately you have to use a private attribute to learn where
        # the content is being sent.
        path = self.channel._path

        print "Server received %d more bytes for %s" % (len(chunk), path)

class StreamingSite(Site):
    requestFactory = StreamingRequestHandler

application = Application("Streaming Upload Server")

factory = StreamingSite(StubResource())
endpoint = serverFromString(reactor, b"tcp:8080")
StreamServerEndpointService(endpoint, factory).setServiceParent(application)

这是一个 tac 文件（将其放入

streamingserver.tac

并运行

twistd -ny streamingserver.tac

）。

由于需要使用

self.channel._path

，这不是完全受支持的方法。 API 整体上也相当笨重，所以这更多的是一个“可能”的例子，而不是它的“好”。长期以来，我们一直致力于让此类事情变得更容易（http://tm.tl/288），但可能还需要很长一段时间才能实现。

Answer 2

ASGI 规范

出现后，这变得更容易了。我在这里使用

Tremolo

，一个专门为流媒体设计的 HTTP 服务器框架。 #!/usr/bin/env python3 from tremolo import Tremolo app = Tremolo() @app.route('/upload') async def upload(**server): request = server['request'] with open('/save/to/image_uploaded.png', 'wb') as f: # read body chunk by chunk async for data in request.read(): # write to file on each chunk f.write(data) return 'Done.' if __name__ == '__main__': app.run('0.0.0.0', 8000, debug=True, reload=True)

看来我有一个使用 gevent 库和猴子补丁的解决方案：

Answer 3

from gevent.monkey import patch_all
patch_all()
from gevent.pywsgi import WSGIServer


def stream_to_internal_storage(data):
    pass


def simple_app(environ, start_response):
    bytes_to_read = 1024

    while True:
        readbuffer = environ["wsgi.input"].read(bytes_to_read)
        if not len(readbuffer) > 0:
            break
        stream_to_internal_storage(readbuffer)

    start_response("200 OK", [("Content-type", "text/html")])
    return ["hello world"]


def run():
    config = {'host': '127.0.0.1', 'port': 45000}

    server = WSGIServer((config['host'], config['port']), application=simple_app)
    server.serve_forever()


if __name__ == '__main__':
    run()

当我尝试上传大文件时效果很好：

curl -i -X PUT --progress-bar --verbose --data-binary @/path/to/huge/file "http://127.0.0.1:45000"

用于流式传输请求正文内容的Python服务器

问题描述投票：0回答：3

3个回答

最新问题

用于流式传输请求正文内容的Python服务器

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3