一个记住的函数,它使用字符串元组返回整数?

问题描述 投票:0回答:2

假设我有像这样的元组数组:

a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]

我正在尝试将这些数组转换为数值向量,每个维度代表一个特征。

所以预期的输出结果如下:

amod = [1, 0, 1]  # or [1, 1, 1]
bmod = [1, 1, 2]  # or [1, 2, 2]

因此,要创建的向量取决于它之前所看到的(即矩形仍被编码为1,但是新值'large'被编码为下一步的2)。

我想我可以使用yield和备忘录功能的某种组合来帮助我。这是我到目前为止尝试过的:

def memoize(f):
    memo = {}
    def helper(x):
        if x not in memo:
            memo[x] = f(x)
            return memo[x]
        return helper

@memoize
def verbal_to_value(tup):
    u = 1
    if tup[0] == 'shape':
        yield u
        u += 1
    if tup[0] == 'fill':
        yield u
        u += 1
    if tup[0] == 'size':
        yield u
        u += 1

但是我仍然收到此错误:

TypeError: 'NoneType' object is not callable

有没有一种方法可以创建此函数,该函数可以存储已看到的内容?如果它可以动态添加键,则可以加分,因此我不必对“形状”或“填充”之类的东西进行硬编码。

python arrays yield memoization
2个回答
1
投票

首先:这是我首选的备忘录实现装饰器,主要是因为速度...

def memoize(f):
    class memodict(dict):
        __slots__ = ()
        def __missing__(self, key):
            self[key] = ret = f(key)
            return ret
    return memodict().__getitem__

[除了一些边缘情况,与您的效果相同:

def memoize(f):
    memo = {}
    def helper(x):
        if x not in memo:
            memo[x] = f(x)
        #else:
        #    pass
        return memo[x]
    return helper

但速度更快,因为if x not in memo:发生在本机代码,而不是python中的代码。要了解它,您只需要知道在正常情况下:解释adict[item]python调用adict.__getitem__(key),如果adict不包含键,__getitem__()调用adict.__missing__(key),因此我们可以利用python魔术方法协议可为我们带来好处...

#This the first idea I had how I would implement your
#verbal_to_value() using memoization:
from collections import defaultdict

work=defaultdict(set)

@memoize 
def verbal_to_value(kv):
    k, v = kv
    aset = work[k]  #work creates a new set, if not already created.
    aset.add(v)     #add value if not already added
    return len(aset)

包括备忘录装饰器,这是15行代码...

#test suite:

def vectorize(alist):
    return [verbal_to_value(kv) for kv in alist]

a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]

print (vectorize(a)) #shows [1,1,1]
print (vectorize(b)) #shows [1,2,2]

defaultdict是一个功能强大的对象,具有几乎相同的逻辑作为备忘:各种方式的标准字典,除了查找失败,它将运行回调函数以创建丢失的内容值。在我们的情况下set()

不幸的是,此问题需要访问被用作键或字典状态本身。随着结果,我们不能只为.default_factory

写一个简单的函数

但是我们可以根据memoize / defaultdict模式编写一个新对象:

#This how I would implement your verbal_to_value without
#memoization, though the worker class is so similar to @memoize,
#that it's easy to see why memoize is a good pattern to work from:
class sloter(dict):
    __slots__ = ()
    def __missing__(self,key):
        self[key] = ret = len(self) + 1
        #this + 1 bothers me, why can't these vectors be 0 based? ;)
        return ret

from collections import defaultdict
work2 = defaultdict(sloter)
def verbal_to_value2(kv):
    k, v = kv
    return work2[k][v]
#~10 lines of code?




#test suite2:

def vectorize2(alist):
    return [verbal_to_value2(kv) for kv in alist]

print (vectorize2(a)) #shows [1,1,1]
print (vectorize2(b)) #shows [1,2,2]

您之前可能已经看过sloter之类的东西,因为它是有时恰好用于这种情况。转换会员名称到数字再返回。因此,我们具有以下优势能够扭转这样的事情:

def unvectorize2(a_vector, pattern=('shape','fill','size')):
    reverser = [{v:k2 for k2,v in work2[k].items()} for k in pattern]
    for index, vect in enumerate(a_vector):
        yield pattern[index], reverser[index][vect]

print (list(unvectorize2(vectorize2(a))))
print (list(unvectorize2(vectorize2(b))))

但是我在您的原始帖子中看到了这些收益,并且它们吸引了我思考...如果有一个对象的备忘录/ defaultdict怎么办可能需要一个生成器而不是一个函数,并且知道提前生成器而不是调用它。然后我意识到...是的,生成器带有一个称为__next__()的可调用对象意味着我们不需要新的defaultdict实现,只需仔细提取正确的成员函数...

def count(start=0): #same as: from itertools import count
    while True:
        yield start
        start += 1

#so we could get the exact same behavior as above, (except faster)
#by saying:
sloter3=lambda :defaultdict(count(1).__next__)
#and then
work3 = defaultdict(sloter3)
#or just:
work3 = defaultdict(lambda :defaultdict(count(1).__next__))
#which yes, is a bit of a mindwarp if you've never needed to do that
#before.

#the outer defaultdict interprets the first item. Every time a new
#first item is received, the lambda is called, which creates a new
#count() generator (starting from 1), and passes it's .__next__ method
#to a new inner defaultdict.

def verbal_to_value3(kv):
    k, v = kv
    return work3[k][v]
#you *could* call that 8 lines of code, but we managed to use
#defaultdict twice, and didn't need to define it, so I wouldn't call
#it 'less complex' or anything.



#test suite3:
def vectorize3(alist):
    return [verbal_to_value3(kv) for kv in alist]

print (vectorize3(a)) #shows [1,1,1]
print (vectorize3(b)) #shows [1,2,2]

#so yes, that can also work.

#and since the internal state in `work3` is stored in the exact same
#format, it be accessed the same way as `work2` to reconstruct input
#from output.
def unvectorize3(a_vector, pattern=('shape','fill','size')):
    reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
    for index, vect in enumerate(a_vector):
        yield pattern[index], reverser[index][vect]

print (list(unvectorize3(vectorize3(a))))
print (list(unvectorize3(vectorize3(b))))

最终评论:

这些实现中的每一个都会在全局状态下存储状态变量。我发现它是抗美学的,但要看你是什么打算以后再使用该向量,这可能是一个功能。当我演示。

编辑:冥想这一天,以及我可能需要的各种情况,我认为我应该像这样封装此功能:

from collections import defaultdict
from itertools import count
class slotter4:
    def __init__(self):
        #keep track what order we expect to see keys
        self.pattern = defaultdict(count(1).__next__)
        #keep track of what values we've seen and what number we've assigned to mean them.
        self.work = defaultdict(lambda :defaultdict(count(1).__next__))
    def slot(self, kv, i=False):
        """used to be named verbal_to_value"""
        k, v = kv
        if i and i != self.pattern[k]:# keep track of order we saw initial keys
            raise ValueError("Input fields out of order")
            #in theory we could ignore this error, and just know
            #that we're going to default to the field order we saw
            #first. Or we could just not keep track, which might be
            #required, if our code runs to slow, but then we cannot
            #make pattern optional in .unvectorize()
        return self.work[k][v]
    def vectorize(self, alist):
        return [self.slot(kv, i) for i, kv in enumerate(alist,1)]
        #if we're not keeping track of field pattern, we could do this instead
        #return [self.work[k][v] for k, v in alist]
    def unvectorize(self, a_vector, pattern=None):
        if pattern is None:
            pattern = [k for k,v in sorted(self.pattern.items(), key=lambda a:a[1])]
        reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
        return [(pattern[index], reverser[index][vect]) 
                for index, vect in enumerate(a_vector)]

#test suite4:
s = slotter4()
if __name__=='__main__':
    Av = s.vectorize(a)
    Bv = s.vectorize(b)
    print (Av) #shows [1,1,1]
    print (Bv) #shows [1,2,2]
    print (s.unvectorize(Av))#shows a
    print (s.unvectorize(Bv))#shows b
else:
    #run the test silently, and only complain if something has broken
    assert s.unvectorize(s.vectorize(a))==a
    assert s.unvectorize(s.vectorize(b))==b

祝你好运!


0
投票

不是最佳方法,但可以帮助您找到更好的解决方案

class Shape:
    counter = {}
    def to_tuple(self, tuples):
        self.tuples = tuples
        self._add()
        l = []
        for i,v in self.tuples:
            l.append(self.counter[i][v])
        return l


    def _add(self):
        for i,v in self.tuples:
            if i in self.counter.keys():
                if v not in self.counter[i]:
                    self.counter[i][v] = max(self.counter[i].values()) +1
            else:
                self.counter[i] = {v: 0}

a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]

b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]   

s = Shape()
s.to_tuple(a)
s.to_tuple(b)
© www.soinside.com 2019 - 2024. All rights reserved.