lru 缓存读取调用的正确方法，以获取在调用时可能已写入或尚未写入的数据

Question

我有一个管理数据的类。每个日期都有写入数据。我有阅读器功能

read(self, for_date)

这会被频繁调用，并且经常会针对相同的日期进行多次调用，因此我添加了一个缓存大小为 100 的 LRU 缓存装饰器（调用日期的滚动窗口约为 100，因此这是合适的）。然而，这个缓存方案不太适合我的用例。

这里的问题是我们可以有以下顺序：

# input_date has not yet been written and this function will return an empty list []
instance.read(for_date=input_date)

# write data for input date
instance.write(for_date=input_date)

# input_date has now been written and the result is no longer an empty list, but if I cached the first call, it would return an empty list here
instance.read(for_date=input_date)

我想到的一种方法是维护已写入的最后一个日期，并在调用

read

之前，检查日期是否已写入，即，围绕

read

创建一个包装函数，然后放入 lru 缓存装饰器位于

read

函数上，而不是包装器上。我不太喜欢这种方法，因为它涉及必须跟踪已写入的日期。

还有哪些其他可行的方法？

实现是在Python中实现的，所以我需要一个Python的工作解决方案，但我认为这里的问题有一些语言不可知论的元素，因此有这个标签。

Answer 1

您可能必须编写自己的缓存算法，因为无法从

@lru_cache

和

@cache

的缓存中外部删除项目。

这是您可以使用的缓存 lru_caching 算法的示例：

def custom_lru(func):
    """
    Based directly on https://pastebin.com/LDwMwtp8

    NOTE: a turned this into a function purely out of personal prefernce.
          Using a callable class is just as effective.
    """

    from collections import OrderedDict
    cache = OrderedDict()
    maxsize=100
    hasher=lambda args: args

    def wrapper(*args):
        nonlocal cache, func, maxsize
        if args in cache:
            cache.move_to_end(args)
            return cache[hasher(args)]
        result = func(*args)
        cache[hasher(args)] = result
        if len(cache) > maxsize:
            cache.popitem(last=False)
        return result

    def setmax(new_size):
        nonlocal maxsize,wrapper
        if not isinstance(new_size,int): raise TypeError()
        maxsize = new_size if new_size>0 else 0
        return wrapper
    
    def sethasher(new_hasher):
        nonlocal hasher,wrapper
        if not isinstance(new_hasher,Callable): raise TypeError()
        hasher = new_hasher
        return wrapper

    wrapper.cache_set_maxsize = setmax # set maxsize
    wrapper.cache_set_hasher = sethasher # set hasher function

    wrapper.cache_clear = cache.clear # clear cache
    wrapper.cache_remove = lambda item: cache.pop((hasher(item),), None) # remove an element from cache
    return wrapper

然后，每当您拨打

set

时，只需在末尾添加

self.read.cache_remove(for_date)

行即可。以下是如何实施的示例：

class Manager:
    @custom_lru
    def read(self,for_date):
        ...
    # optional customization
    read.cache_set_maxsize(100).cache_set_hasher(lambda args: args[1:])
                                                #lambda args: f"{id(arg[0])}{args[1:]}"
    
    def write(self,for_date):
        # do something
        self.read.remove(for_date)

对于创建 lru 缓存的更通用方法 - 我建议阅读此代码，因为它简单且易于复制/逆向工程：

import collections
 
class LRU:
 
    def __init__(self, func, maxsize=128):
        self.cache = collections.OrderedDict()
        self.func = func
        self.maxsize = maxsize
 
    def __call__(self, *args):
        cache = self.cache
        if args in cache:
            cache.move_to_end(args)
            return cache[args]
        result = self.func(*args)
        cache[args] = result
        if len(cache) > self.maxsize:
            cache.popitem(last=False)
        return result
 
if __name__ == '__main__':
 
    f = LRU(ord, maxsize=3)
    for c in 'ABCDCE':
        print('%s -> %s\t\t%r' % (c, f(c), f.cache))
    f.cache.pop(('C',), None)          # invalidate 'C'
    print(f.cache)
    f.cache.clear()

lru 缓存读取调用的正确方法，以获取在调用时可能已写入或尚未写入的数据

问题描述投票：0回答：1

1个回答

最新问题

lru 缓存读取调用的正确方法，以获取在调用时可能已写入或尚未写入的数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1