富有表现力的方式在Python中组合生成器

问题描述 投票:4回答:4

我真的很喜欢Python生成器。特别是,我发现它们只是连接到Rest端点的正确工具 - 我的客户端代码只需要迭代连接端点的生成器。但是,我发现Python的生成器并不像我想的那样富有表现力。通常,我需要过滤从端点获取的数据。在我当前的代码中,我将谓词函数传递给生成器,它将谓词应用于它正在处理的数据,并且只有谓词为True时才生成数据。

我想转向生成器的组合 - 比如data_filter(datasource())。这是一些演示代码,显示了我尝试过的内容。很明显为什么它不起作用,我想弄清楚的是什么是最有表现力的解决方案:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)
python generator function-composition
4个回答
4
投票

data_filter应该将len应用于d的元素而不是d本身,如下所示:

def data_filter (d):
    for x in d:
        if len(x) < 8:
            yield x

现在你的代码:

for w in data_filter(mock_datasource()):
    print(w)

回报

liberty
seminar
formula
comedy

1
投票

更简洁地说,您可以直接使用生成器表达式执行此操作:

def length_filter(d, minlen=0, maxlen=8):
    return (x for x in d if minlen <= len(x) < maxlen)

将过滤器应用于您的生成器,就像常规功能一样:

for element in length_filter(endpoint_data()):
    ...

如果您的谓词非常简单,内置函数filter也可以满足您的需求。


0
投票

您可以传递您为每个项目应用的过滤器功能:

def mock_datasource(filter_function):
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
             "formula","short-circuit", "generate", "comedy"]

    for d in mock_data:
        yield filter_function(d)

def filter_function(d):
    # filter
    return filtered_data

0
投票

我要做的是定义filter(data_filter)接收生成器作为输入并返回一个生成器,其值由data_filter谓词过滤(常规谓词,不知道生成器接口)。

代码是:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    def generator(coll):
        for x in coll:
            if pred(x):
                yield x
    return generator

def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

def data_filter (d):
    if len(d) < 8:
        return True


gen1 = mock_datasource()
filtering = filter(data_filter)
gen2 = filtering(gen1) # or filter(data_filter)(mock_datasource())

print(list(gen2)) 

如果你想进一步改进,可以使用compose这是我认为的全部意图:

from functools import reduce

def compose(*fns):
    """Compose functions left to right - allows generators to compose with same
    order as Clojure style transducers in first argument to transduce."""
    return reduce(lambda f,g: lambda *x, **kw: g(f(*x, **kw)), fns)

gen_factory = compose(mock_datasource, 
                      filter(data_filter))
gen = gen_factory()

print(list(gen))

PS:我使用了一些代码找到here,其中Clojure人员表达了发电机的组成,这些发电机的灵感来自于他们通常使用传感器进行组合的方式。 PS2:filter可能以更加pythonic的方式编写:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    return lambda coll: (x for x in coll if pred(x))
© www.soinside.com 2019 - 2024. All rights reserved.