我真的很喜欢Python生成器。特别是,我发现它们只是连接到Rest端点的正确工具 - 我的客户端代码只需要迭代连接端点的生成器。但是,我发现Python的生成器并不像我想的那样富有表现力。通常,我需要过滤从端点获取的数据。在我当前的代码中,我将谓词函数传递给生成器,它将谓词应用于它正在处理的数据,并且只有谓词为True时才生成数据。
我想转向生成器的组合 - 比如data_filter(datasource())。这是一些演示代码,显示了我尝试过的内容。很明显为什么它不起作用,我想弄清楚的是什么是最有表现力的解决方案:
# Mock of Rest Endpoint: In actual code, generator is
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield d
# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external"
def data_filter (d):
if len(d) < 8:
yield d
# First Try:
# for w in data_filter(mock_datasource()):
# print(w)
# >> TypeError: object of type 'generator' has no len()
# Second Try
# for w in (data_filter(d) for d in mock_datasource()):
# print(w)
# I don't get words out,
# rather <generator object data_filter at 0x101106a40>
# Using a predicate to filter works, but is not the expressive
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
print(w)
data_filter
应该将len
应用于d
的元素而不是d
本身,如下所示:
def data_filter (d):
for x in d:
if len(x) < 8:
yield x
现在你的代码:
for w in data_filter(mock_datasource()):
print(w)
回报
liberty
seminar
formula
comedy
更简洁地说,您可以直接使用生成器表达式执行此操作:
def length_filter(d, minlen=0, maxlen=8):
return (x for x in d if minlen <= len(x) < maxlen)
将过滤器应用于您的生成器,就像常规功能一样:
for element in length_filter(endpoint_data()):
...
如果您的谓词非常简单,内置函数filter
也可以满足您的需求。
您可以传递您为每个项目应用的过滤器功能:
def mock_datasource(filter_function):
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield filter_function(d)
def filter_function(d):
# filter
return filtered_data
我要做的是定义filter(data_filter)
接收生成器作为输入并返回一个生成器,其值由data_filter
谓词过滤(常规谓词,不知道生成器接口)。
代码是:
def filter(pred):
"""Filter, for composition with generators that take coll as an argument."""
def generator(coll):
for x in coll:
if pred(x):
yield x
return generator
def mock_datasource ():
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield d
def data_filter (d):
if len(d) < 8:
return True
gen1 = mock_datasource()
filtering = filter(data_filter)
gen2 = filtering(gen1) # or filter(data_filter)(mock_datasource())
print(list(gen2))
如果你想进一步改进,可以使用compose
这是我认为的全部意图:
from functools import reduce
def compose(*fns):
"""Compose functions left to right - allows generators to compose with same
order as Clojure style transducers in first argument to transduce."""
return reduce(lambda f,g: lambda *x, **kw: g(f(*x, **kw)), fns)
gen_factory = compose(mock_datasource,
filter(data_filter))
gen = gen_factory()
print(list(gen))
PS:我使用了一些代码找到here,其中Clojure人员表达了发电机的组成,这些发电机的灵感来自于他们通常使用传感器进行组合的方式。 PS2:filter
可能以更加pythonic的方式编写:
def filter(pred):
"""Filter, for composition with generators that take coll as an argument."""
return lambda coll: (x for x in coll if pred(x))