将生成器转换为迭代器类的最佳方法

问题描述 投票:0回答:2

考虑以下虚拟示例:

def common_divisors_generator(n, m):

    # Init code
    factors_n = [i for i in range(1, n + 1) if n%i == 0]
    factors_m = [i for i in range(1, m + 1) if m%i == 0]

    # Iterative code
    for fn in factors_n:
        for fm in factors_m:
            if fn == fm:
                yield fn

# The next line is fast because no code is executed yet
cdg = common_divisors_generator(1537745, 373625435)
# Next line is slow because init code is executed on first iteration call
for g in cdg:
    print(g)

初始化代码需要很长时间来计算,一旦生成器第一次迭代(而不是初始化生成器时)就会执行。我更喜欢在初始化生成器时执行初始化代码。

为此,我将生成器转换为迭代器类,如下所示:

class CommonDivisorsIterator(object):

    def __init__(self, n, m):
        # Init code
        self.factors_n = [i for i in range(1, n + 1) if n%i == 0]
        self.factors_m = [i for i in range(1, m + 1) if m%i == 0]

    def __iter__(self):
        return self

    def __next__(self):
        # Some Pythonic implementation of the iterative code above
        # ...
        return next_common_divisor

与使用

__next__
关键字在生成器中迭代代码的简单性相比,我能想到的实现上述
yield
方法的所有方法都非常麻烦。

在迭代器类中实现

__next__
方法的最 Pythonic 方式是什么?

或者,如何修改生成器以便在初始化时执行初始化代码?

python iterator generator
2个回答
5
投票

在这两种情况下(无论使用函数还是类),解决方案是将实现分为两个函数:设置函数和生成器函数。

在函数中使用

yield
会将其变成生成器函数,这意味着它在调用时返回一个生成器。但即使不使用
yield
,也没有什么可以阻止您创建生成器并返回它,如下所示:

def common_divisors_generator(n, m):
    factors_n = [i for i in range(1, n + 1) if n%i == 0]
    factors_m = [i for i in range(1, m + 1) if m%i == 0]

    def gen():
        for fn in factors_n:
            for fm in factors_m:
                if fn == fm:
                    yield fn

    return gen()

如果您使用类,则无需实现

__next__
方法。您可以在
yield
方法中使用
__iter__

class CommonDivisorsIterator(object):
    def __init__(self, n, m):
        self.factors_n = [i for i in range(1, n + 1) if n%i == 0]
        self.factors_m = [i for i in range(1, m + 1) if m%i == 0]

    def __iter__(self):
        for fn in self.factors_n:
            for fm in self.factors_m:
                if fn == fm:
                    yield fn

0
投票

您的具体问题的其他答案都是正确的,但如果主要关注的是初始化时间,我建议一般来说,代码可以优化很多。

例如,在“迭代器”部分中,您将

n
的因子与
m
的因子进行比较。该嵌套循环的运行时间为 O(n*m)。我当然知道你为什么这样做:一个值可能有
[2, 3, 5]
作为因子,而另一个值只有
[2, 5]
,你不想错过
5
,因为
3
没有。匹配。但是还有更快的方法可以实现这一点,我在下面写了其中的一些方法。

首先,您会注意到我稍微改变了寻找因子的方式。您不需要超越数字的平方根来找到它的所有因子,因为数字的每个因子都有一个“补数”,它也是它的因子。因此,例如,每个数字

x
都有因子
1
,其中
x
作为补数。因此,一旦您知道
y
x
的因子,您也就知道
x / y
x
的因子。这样,您就可以在
x
时间内找出数字
O(sqrt(x))
的所有因数。这大大加快了大型
x
的速度。

其次,您不应该将因子存储在列表中,而应该将它们存储在集合中。不存在重复的风险,因此

set
是理想的选择。它们可以排序,并且有
O(1)
查找时间(如哈希图)。这样,您只需要迭代
x
的因子即可找出它们是否与
y
的因子相同。仅此一项就会将您的运行时间从
O(n*m)
更改为
O(min(n, m))
,其中
n
m
是因子集的大小。

最后,如果我要实现这个,我可能会懒惰地做,只在需要时寻找新的共同因素。这完全消除了初始化步骤的成本。

我已经包含了下面两种方法的实现。

from math import sqrt
def find_factors(x: int) -> int:
    """Here's the deal with the sqrt thing. For any value x with factor a, there's a complement b.
    i.e., if x = 8, then we know 2 * 4 = 8. 2 is the first factor we'll find counting from 1 upward,
    but once we find 2, we know that 4 (the complement) is also a factor of 8. The largest
    unique factor whose complement can't be known earlier is the sqrt(x). So to save time,
    we just put aside the complements until later. If we didn't do this, we'd have to iterate
    all the way to x (which could be quite large) even though we'd already know that there are no
    factors between (for example), x / 2 and x.
    This changes the runtime of finding the initial "smaller" factors to log(sqrt(x)), and the
    total time to O(z) where z is the number of factors in x and z will always be smaller than x
    for all x > 2.

    Args:
        x (int): The value to factor
    """
    complements = []
    for v in range(1, int(sqrt(x))+1):
        if x % v == 0:
            complements.append(x // v)
            yield v
    
    for v in reversed(complements):
        yield v
   
def common_factors_greedy(n, m):
    """
    This will run in O(min(N, M)) time instead of O(min(N^2, M^2)) time.
    Note that N, M are that size of the factor set, not the size of n or m.
    """
    
    # I'd recommend creating these as sets to begin with so you don't have to waste cycles converting them
    factors_n, factors_m = set(find_factors(n)), set(find_factors(m))
    common_factors = factors_n & factors_m
    for c in common_factors:
        yield c

def common_factors_lazy(n, m):
    """
     Generates common factors of n and m lazily, which means there's no initialization cost up front. You only use
     compute when you actually look for the next common factor. Overall, might be overkill as using the approach
     I wrote up for finding factors, its pretty fast even for large n or m. But still worth thinking about
     for other kinds of problems.
    """
    # Note: Factors_n/m are not lists of factors. They are generator objects. They don't actually "know"
    # anything about the factors of n or m until you call next(factors_n) or next(factors_m).
    factors_n, factors_m = find_factors(n), find_factors(m)
    x, y = next(factors_n), next(factors_m)
    x_empty = y_empty = False
    while not (x_empty and y_empty) and not ((x_empty and x < y) or (y_empty and y < x)):
        if x == y:
            yield x
            try:
                x = next(factors_n)
                y = next(factors_m)
            except StopIteration:
                return
        elif x > y:
            try:    
                y = next(factors_m)
            except StopIteration:
                y_empty = True
        else:
            try:
                x = next(factors_n)
            except StopIteration:
                x_empty = True


def main():
    N = 1_220_142
    M = 837_462
    
    for fact in find_factors(N):
        print(f'Factor of N: {fact}')
    for fact in find_factors(M):
        print(f'Factor of M: {fact}')
    
    for com in common_factors_greedy(N, M):
        print(f'Greedy factor: {com}')
    
    for com in common_factors_lazy(N, M):
        print(f'Lazy factor: {com}')
© www.soinside.com 2019 - 2024. All rights reserved.