压缩的Python生成器,第二个生成器更短:如何检索被静默消耗的元素

问题描述 投票:8回答:4

我想用zip解析两个(可能)不同长度的生成器:

for el1, el2 in zip(gen1, gen2):
    print(el1, el2)

但是,如果gen2的元素较少,则gen1的一个额外元素将被“消耗”。

例如,

def my_gen(n:int):
    for i in range(n):
        yield i

gen1 = my_gen(10)
gen2 = my_gen(8)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen1))  # printed value is "9" => 8 is missing

gen1 = my_gen(8)
gen2 = my_gen(10)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen2))  # printed value is "8" => OK

[显然,缺少值(在我先前的示例中为8),因为在意识到gen1没有更多元素之前已读取8(因此生成值gen2)。但是这个价值在宇宙中消失了。当gen2为“更长”时,则不存在此类“问题”。

QUESTION:有没有办法找回这个缺失的值(即在我之前的示例中为8)?

注意:我目前已经通过使用itertools.zip_longest的另一种方式实现,但是这个问题使我感到困惑。

python python-3.x zip generator itertools
4个回答
7
投票

一种方法是实现允许您缓存最后一个值的生成器:

class cache_last:
    """
    Wraps an iterable in an iterator that can retrieve the last value.

    .. attribute:: obj

       A reference to the wrapped iterable. Provided for convenience
       of one-line initializations.
    """
    def __init__(self, iterable):
        self.obj = iterable
        self.iter = iter(iterable)
        self.sentinel = object()

    @property
    def last(self):
        """
        The last object yielded by the wrapped iterator.

        Uninitialized iterators raise a `ValueError`. Exhausted
        iterators raise a `StopIteration`.
        """
        if not hasattr(self, 'prev'):
            raise ValueError('Not started yet!')
        if self.prev is self.sentinel:
            raise StopIteration

    def __next__(self):
        """
        Retrieve, record, and return the next value of the iteration.
        """
        try:
            self.prev = next(self.iter)
        except StopIteration:
            self.prev = self.sentinel
            raise
        return self.prev

    def __iter__(self):
        """
        This object is already an iterator.
        """
        return self

要使用此功能,请将输入内容包装到zip

gen1 = cache_last(range(10))
gen2 = iter(range(8))
list(zip(gen1, gen2))
print(gen1.last)
print(next(gen1)) 

使gen2成为迭代器而不是可迭代的对象很重要,因此您可以知道哪一个已用尽。如果gen2用尽,则无需检查gen1.last

另一种方法是重写zip以接受可变的可迭代序列而不是单独的可迭代序列。这样就可以用包含“偷看”项目的链式版本替换可迭代项:

def myzip(iterables):
    iterators = [iter(it) for it in iterables]
    while True:
        items = []
        for it in iterators:
            try:
                items.append(next(it))
            except StopIteration:
                for i, peeked in enumerate(items):
                    iterables[i] = itertools.chain([peeked], iterators[i])
                return
            else:
                yield tuple(items)

gens = [range(10), range(8)]
list(myzip(gens))
print(next(gens[0]))

此方法有很多问题。它不仅会丢失原始的可迭代对象,而且还会丢失通过将其替换为chain对象而可能具有的原始对象的任何有用属性。


6
投票

这是zip中给出的zip实现等效项>

docs

在您的第一个示例中,def zip(*iterables): # zip('ABCD', 'xy') --> Ax By sentinel = object() iterators = [iter(it) for it in iterables] while iterators: result = [] for it in iterators: elem = next(it, sentinel) if elem is sentinel: return result.append(elem) yield tuple(result) gen1=my_gen(10)。在两个发生器都消耗完之后,直到第7次迭代。现在在第8次迭代中,gen2=my_gen(8)调用返回8的gen1,但是当elem = next(it, sentinel)调用gen2时,它返回elem = next(it, sentinel)(因为sentinel已用尽)并且满足gen2且函数执行return并停止。现在if elem is sentinel返回9。

在您的第二个示例中,next(gen1)gen1=gen(8)。在两个发生器都消耗完之后,直到第7次迭代。现在,在第8次迭代中,gen2=gen(10)调用返回gen1elem = next(it, sentinel)(因为此时sentinel已用尽)并且满足gen1且该函数执行return并停止。现在if elem is sentinel返回8。

next(gen2)的启发,您可以使用此Mad Physicist's answer包装器对其进行反击:

cache_last

1
投票

[将In [97]: class cache_last: ...: def __init__(self, iter): ...: self.iter = iter ...: def __next__(self): ...: return self.prev ...: def __iter__(self): ...: for item in self.iter: ...: self.prev = item ...: yield item ...: In [98]: gen1 = cache_last(range(10)) ...: gen2 = range(8) ...: list(zip(gen1, gen2)) Out[98]: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7)] In [99]: next(gen1) Out[99]: 8 与两个长度不同的生成器一起使用时,基本上将使用以下算法:


0
投票

我可以看到您已经找到了这个答案,并在评论中提到了这个问题,但我想我会从中做出答案。您要使用def my_gen(n:int): for i in range(n): yield i gen1 = my_gen(10) gen2 = my_gen(8) list(zip(gen1, gen2)) print(next(gen1)) # printed value is "9" gen1 = my_gen(10) gen2 = my_gen(8) list(zip(gen2, gen1)) # switched gen2 and gen1 print(next(gen1)) # printed value is "8" ,它将用itertools.zip_longest()替换较短生成器的空值:

© www.soinside.com 2019 - 2024. All rights reserved.