Python elasticsearch-dsl django 分页

问题描述 投票:0回答:5

我如何在elasticsearch dsl上使用django分页。 我的代码:

query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')

s = Search(using=elastic_client, index='post').query(query).sort('-created_at')
response = s.execute()

// this always returns page count 1
paginator = Paginator(response, 100)
page = request.GET.get('page')
try:
    posts = paginator.page(page)
except PageNotAnInteger:
    posts = paginator.page(1)
except EmptyPage:
    posts = paginator.page(paginator.num_pages)

有什么解决办法吗?

python django pagination elasticsearch-dsl
5个回答
14
投票

我在这个链接上找到了这个分页器:

from django.core.paginator import Paginator, Page

class DSEPaginator(Paginator):
    """
    Override Django's built-in Paginator class to take in a count/total number of items;
    Elasticsearch provides the total as a part of the query results, so we can minimize hits.
    """
    def __init__(self, *args, **kwargs):
        super(DSEPaginator, self).__init__(*args, **kwargs)
        self._count = self.object_list.hits.total

    def page(self, number):
        # this is overridden to prevent any slicing of the object_list - Elasticsearch has
        # returned the sliced data already.
        number = self.validate_number(number)
        return Page(self.object_list, number, self)

然后在视图中我使用:

    q = request.GET.get('q', None)
    page = int(request.GET.get('page', '1'))
    start = (page-1) * 10
    end = start + 10

    query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')
    s = Search(using=elastic_client, index='post').query(query)[start:end]
    response = s.execute()

    paginator = DSEPaginator(response, settings.POSTS_PER_PAGE)
    try:
        posts = paginator.page(page)
    except PageNotAnInteger:
        posts = paginator.page(1)
    except EmptyPage:
        posts = paginator.page(paginator.num_pages)

这样就可以完美工作了..


1
投票

根据 Danielle Madeley 的建议,我还创建了一个搜索结果代理,该代理与最新版本的

django-elasticsearch-dsl==0.4.4
配合良好。

from django.utils.functional import LazyObject

class SearchResults(LazyObject):
    def __init__(self, search_object):
        self._wrapped = search_object

    def __len__(self):
        return self._wrapped.count()

    def __getitem__(self, index):
        search_results = self._wrapped[index]
        if isinstance(index, slice):
            search_results = list(search_results)
        return search_results

然后您可以在搜索视图中使用它,如下所示:

paginate_by = 20
search = MyModelDocument.search()
# ... do some filtering ...
search_results = SearchResults(search)

paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get("page")
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    # If page parameter is not an integer, show first page.
    page = paginator.page(1)
except EmptyPage:
    # If page parameter is out of range, show last existing page.
    page = paginator.page(paginator.num_pages)

Django 的 LazyObject 代理分配给 _wrapped 属性的对象中的所有属性和方法。我重写了 Django 分页器所需的几个方法,但不能与 Search() 实例一起开箱即用。


1
投票

一个非常简单的解决方案是使用 MultipleObjectMixin 并通过覆盖它来提取

get_queryset()
中的 Elastic 结果。在这种情况下,如果您添加
paginate_by
属性,Django 将自行处理分页。

它应该看起来像这样:

class MyView(MultipleObjectMixin, ListView):
    paginate_by = 10

    def get_queryset(self):
        object_list = []
        """ Query Elastic here and return the response data in `object_list`.
            If you wish to add filters when querying Elastic,
            you can use self.request.GET params here. """
        return object_list

注意:上面的代码很广泛,与我自己的情况不同,所以我不能保证它有效。我通过继承其他 Mixins、覆盖

get_queryset()
并利用 Django 的内置分页功能来使用类似的解决方案 - 它对我来说非常有用。因为这是一个简单的修复,所以我决定将其发布在这里并提供一个类似的示例。


0
投票

另一种方法是在

Paginator
和 Elasticsearch 查询之间创建代理。
Paginator
需要两个东西,
__len__
(或
count
)和
__getitem__
(需要一片)。代理的粗略版本的工作原理如下:

class ResultsProxy(object):
    """
    A proxy object for returning Elasticsearch results that is able to be
    passed to a Paginator.
    """

    def __init__(self, es, index=None, body=None):
        self.es = es
        self.index = index
        self.body = body

    def __len__(self):
        result = self.es.count(index=self.index,
                               body=self.body)
        return result['count']

    def __getitem__(self, item):
        assert isinstance(item, slice)

        results = self.es.search(
            index=self.index,
            body=self.body,
            from_=item.start,
            size=item.stop - item.start,
        )

        return results['hits']['hits']

可以将代理实例传递给

Paginator
,并根据需要向ES发出请求。


0
投票

对于带有 drf 的 django 4+,这是按书本解决方案,但我会将其放在这里,因为我花了大约 2 个小时才找到。

class DSLPaginator(Paginator):
    """
    Override Django's built-in Paginator class to take in a count/total number of items;
    Elasticsearch provides the total as a part of the query results, so we can minimize hits.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.count = self.object_list.hits.total["value"]

    def page(self, number):
        # this is overridden to prevent any slicing of the object_list - Elasticsearch has
        # returned the sliced data already.
        number = self.validate_number(number)
        return Page(self.object_list, number, self)


class ESPageNumberPagination(PageNumberPagination):
    django_paginator_class = DSLPaginator
    page_size = 12

使用DRF,像这样使用它:

class VideoListESView(generics.ListAPIView):
    serializer_class = VideoListESSerializer
    model = serializer_class.Meta.model
    pagination_class = ESPageNumberPagination

    def get_queryset(self):    
        page = int(self.request.GET.get("page", 1))
        page_size = self.pagination_class.page_size
        s = VideoDocument.search()[(page - 1) * page_size : page * page_size].sort({"id": {"order": "desc"}}).execute()
        return s
© www.soinside.com 2019 - 2024. All rights reserved.