在GCP中我们可以看到管道执行图。通过 DirectRunner 在本地运行时是否可以实现同样的效果?
您可以使用
pipeline_graph
和 InteractiveRunner
在本地获取管道的 graphviz 表示。
Beam 文档中使用的字数统计管道示例:
import apache_beam as beam
from apache_beam.runners.interactive.display import pipeline_graph
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import re
pipeline = beam.Pipeline(InteractiveRunner())
lines = pipeline | beam.Create([f"some_file_{i}.txt" for i in range(10)])
# Count the occurrences of each word.
counts = (
lines
| 'Split' >> (
beam.FlatMap(
lambda x: re.findall(r'[A-Za-z\']+', x)).with_output_types(str))
| 'PairWithOne' >> beam.Map(lambda x: (x, 1))
| 'GroupAndSum' >> beam.CombinePerKey(sum))
# Format the counts into a PCollection of strings.
def format_result(word_count):
(word, count) = word_count
return f'{word}: {count}'
output = counts | 'Format' >> beam.Map(format_result)
# Write the output using a "Write" transform that has side effects.
# pylint: disable=expression-not-assigned
output | beam.io.WriteToText("some_file.txt")
print(pipeline_graph.PipelineGraph(pipeline).get_dot())
此打印
digraph G {
node [color=blue, fontcolor=blue, shape=box];
"Create";
lines [shape=circle];
"Split";
pcoll4978 [label="", shape=circle];
"PairWithOne";
pcoll8859 [label="", shape=circle];
"GroupAndSum";
counts [shape=circle];
"Format";
output [shape=circle];
"WriteToText";
pcoll6409 [label="", shape=circle];
"Create" -> lines;
lines -> "Split";
"Split" -> pcoll4978;
pcoll4978 -> "PairWithOne";
"PairWithOne" -> pcoll8859;
pcoll8859 -> "GroupAndSum";
"GroupAndSum" -> counts;
counts -> "Format";
"Format" -> output;
output -> "WriteToText";
"WriteToText" -> pcoll6409;
}
将其放入 https://edotor.net 结果:
如果需要,您可以在 Python 中使用 GraphViz 来生成更漂亮的输出(例如 graphviz)。
您还可以使用Python的RenderRunner,例如
python -m apache_beam.examples.wordcount --output out.txt \
--runner=apache_beam.runners.render.RenderRunner \
--render_output=pipeline.svg
这还有一个交互模式,通过传递
--port=N
(其中 0 可用于选择未使用的端口)来触发,该模式将图表作为本地 Web 服务出售。这允许人们展开/折叠复合材料以方便探索。当您编辑图表时,传递的任何 --render_output
参数都将重新渲染。 (它在底层使用 graphviz,因此可以渲染任何受支持的格式。)
为了渲染非 Python 管道,可以将其作为本地便携式“运行器”启动。
python -m apache_beam.runners.render
然后通过便携式运行程序通过提供的作业 API 端点从其他 SDK“提交”此作业以查看它。