如何缓存分页的 Django 查询集

问题描述 投票:0回答:4

如何缓存分页的 Django 查询集,特别是在 ListView 中?

我注意到一个查询需要很长时间才能运行,因此我尝试缓存它。查询集很大(超过 100k 条记录),因此我尝试仅缓存其分页部分。我无法缓存整个视图或模板,因为有些部分是特定于用户/会话的并且需要不断更改。

ListView 有几个用于检索查询集的标准方法,

get_queryset()
(返回非分页数据)和
paginate_queryset()
(按当前页面过滤数据)。

我首先尝试在

get_queryset()
中缓存查询,但很快意识到调用
cache.set(my_query_key, super(MyView, self).get_queryset())
会导致整个查询被序列化。

然后我尝试覆盖

paginate_queryset()
,例如:

import time
from functools import partial
from django.core.cache import cache
from django.views.generic import ListView

class MyView(ListView):

    ...

    def paginate_queryset(self, queryset, page_size):
        cache_key = 'myview-queryset-%s-%s' % (self.page, page_size)
        print 'paginate_queryset.cache_key:',cache_key
        t0 = time.time()
        ret = cache.get(cache_key)
        if ret is None:
            print 're-caching'
            ret = super(MyView, self).paginate_queryset(queryset, page_size)
            cache.set(cache_key, ret, 60*60)
        td = time.time() - t0
        print 'paginate_queryset.time.seconds:',td
        (paginator, page, object_list, other_pages) = ret
        print 'total objects:',len(object_list)
        return ret

然而,尽管只检索了 10 个对象,并且每个请求都显示“重新缓存”,这意味着没有任何内容被保存到缓存,但运行时间几乎需要一分钟。

我的

settings.CACHE
看起来像:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}

service memcached status
显示 memcached 正在运行,而
tail -f /var/log/memcached.log
完全没有显示任何内容。

我做错了什么?缓存分页查询以便不检索整个查询集的正确方法是什么?

编辑:我认为它们可能是 memcached 或 Python 包装器中的错误。 Django 似乎支持两种不同的 memcached 后端,一种使用 python-memcached,一种使用 pylibmc。 python-memcached 似乎默默地隐藏了缓存

paginate_queryset()
值的错误。当我切换到 pylibmc 后端时,现在我收到一条明确的错误消息“error 10 from memcached_set: SERVER ERROR”,追溯到 set 中的 django/core/cache/backends/memcached.py,第 78 行。

python django django-models memcached django-views
4个回答
4
投票

您可以扩展

Paginator
以通过提供的
cache_key
支持缓存。

有关此类

CachedPaginator
的使用和实现的博客文章可以在这里找到。源代码发布在 djangosnippets.org(这里有一个 web-acrhive 链接,因为原始代码无法工作)。

但是,我将发布一个对原始版本稍加修改的示例,它不仅可以缓存每页的对象,还可以缓存总数。 (有时甚至计数也可能是一项昂贵的操作)。

from django.core.cache import cache
from django.utils.functional import cached_property
from django.core.paginator import Paginator, Page, PageNotAnInteger


class CachedPaginator(Paginator):
    """A paginator that caches the results on a page by page basis."""
    def __init__(self, object_list, per_page, orphans=0, allow_empty_first_page=True, cache_key=None, cache_timeout=300):
        super(CachedPaginator, self).__init__(object_list, per_page, orphans, allow_empty_first_page)
        self.cache_key = cache_key
        self.cache_timeout = cache_timeout

    @cached_property
    def count(self):
        """
            The original django.core.paginator.count attribute in Django1.8
            is not writable and cant be setted manually, but we would like
            to override it when loading data from cache. (instead of recalculating it).
            So we make it writable via @cached_property.
        """
        return super(CachedPaginator, self).count

    def set_count(self, count):
        """
            Override the paginator.count value (to prevent recalculation)
            and clear num_pages and page_range which values depend on it.
        """
        self.count = count
        # if somehow we have stored .num_pages or .page_range (which are cached properties)
        # this can lead to wrong page calculations (because they depend on paginator.count value)
        # so we clear their values to force recalculations on next calls
        try:
            del self.num_pages
        except AttributeError:
            pass
        try:
            del self.page_range
        except AttributeError:
            pass

    @cached_property
    def num_pages(self):
        """This is not writable in Django1.8. We want to make it writable"""
        return super(CachedPaginator, self).num_pages

    @cached_property
    def page_range(self):
        """This is not writable in Django1.8. We want to make it writable"""
        return super(CachedPaginator, self).page_range

    def page(self, number):
        """
        Returns a Page object for the given 1-based page number.

        This will attempt to pull the results out of the cache first, based on
        the requested page number. If not found in the cache,
        it will pull a fresh list and then cache that result + the total result count.
        """
        if self.cache_key is None:
            return super(CachedPaginator, self).page(number)

        # In order to prevent counting the queryset
        # we only validate that the provided number is integer
        # The rest of the validation will happen when we fetch fresh data.
        # so if the number is invalid, no cache will be setted
        # number = self.validate_number(number)
        try:
            number = int(number)
        except (TypeError, ValueError):
            raise PageNotAnInteger('That page number is not an integer')

        page_cache_key = "%s:%s:%s" % (self.cache_key, self.per_page, number)
        page_data = cache.get(page_cache_key)

        if page_data is None:
            page = super(CachedPaginator, self).page(number)
            #cache not only the objects, but the total count too.
            page_data = (page.object_list, self.count)
            cache.set(page_cache_key, page_data, self.cache_timeout)
        else:
            cached_object_list, cached_total_count = page_data
            self.set_count(cached_total_count)
            page = Page(cached_object_list, number, self)

        return page

2
投票

问题是多种因素综合作用的结果。主要是,

paginate_queryset()
返回的结果包含对无限查询集的引用,这意味着它本质上是不可缓存的。当我调用
cache.set(mykey, (paginator, page, object_list, other_pages))
时,它试图序列化数千条记录,而不仅仅是我期望的
page_size
记录数,导致缓存的项目超出 memcached 的限制并失败。

另一个因素是 memcached/python-memcached 中可怕的默认错误报告,它会默默地隐藏所有错误,并在出现任何问题时将 cache.set() 转换为 nop,这使得追踪问题非常耗时。

我通过基本上重写

paginate_queryset()
来解决这个问题,完全放弃 Django 的内置分页器功能并自己计算查询集:

object_list = queryset[page_size*(page-1):page_size*(page-1)+page_size]

然后缓存 that

object_list


0
投票

我想在主页上对无限滚动视图进行分页,这是我想出的解决方案。它是 Django CCBV 和作者最初的解决方案的混合体。

然而,响应时间并没有像我希望的那样改善,但这可能是因为我正在本地测试它,只有 6 个帖子和 2 个用户哈哈。

    # Import
    from django.core.cache import cache
    from django.core.paginator import InvalidPage
    from django.views.generic.list import ListView
    from django.http Http404

    class MyListView(ListView):
    template_name = 'MY TEMPLATE NAME'
    model = MY POST MODEL
    paginate_by = 10



    def paginate_queryset(self, queryset, page_size):

        """Paginate the queryset"""
        paginator = self.get_paginator(
            queryset, page_size, orphans=self.get_paginate_orphans(),
            allow_empty_first_page=self.get_allow_empty())

        page_kwarg = self.page_kwarg

        page = self.kwargs.get(page_kwarg) or self.request.GET.get(page_kwarg) or 1

        try:
            page_number = int(page)

        except ValueError:
            if page == 'last':
                page_number = paginator.num_pages

            else:
                raise Http404(_("Page is not 'last', nor can it be converted to an int."))
        try:
            page = paginator.page(page_number)
            cache_key = 'mylistview-%s-%s' % (page_number, page_size)
            retreive_cache = cache.get(cache_key)

            if retreive_cache is None:
                print('re-caching')
                retreive_cache = super(MyListView, self).paginate_queryset(queryset, page_size)

                # Caching for 1 day
                cache.set(cache_key, retreive_cache, 86400)

            return retreive_cache
        except InvalidPage as e:
            raise Http404(_('Invalid page (%(page_number)s): %(message)s') % {
                'page_number': page_number,
                'message': str(e)
            })


0
投票

这里解释了如何使用 Todor 的精彩 answer

ListView
中缓存分页。假设您的应用程序中有多个
ListView
。他们每个人都需要自己独特的
cache_key
。您添加
paginator_class = CachedPaginator
并通过父类覆盖
get_paginator
函数。

from myapp.utils import CachedPaginator

class ModelAView(ListView):
    model = ModelA
    template_name = "model_a.html"
    paginator_class = CachedPaginator  # instead of default Paginator
    paginate_by = 20

    def get_paginator(
        self, queryset, per_page, orphans=0, allow_empty_first_page=True, **kwargs
    ):
        paginator_cache_key = "model_a_" + str(self.kwargs["model_a_pk"])
        return self.paginator_class(
            queryset,
            per_page,
            orphans=orphans,
            allow_empty_first_page=allow_empty_first_page,
            cache_key=paginator_cache_key,
            **kwargs,
        )
© www.soinside.com 2019 - 2024. All rights reserved.