Apache Beam Map、DoFn 和 Composite Transform

问题描述 投票:0回答:0

我想了解 Map 函数、从 Pardo 调用的 DoFn 和 Composite 转换之间的用例的区别。

我可以使用以下代码获得相同的结果,以获得我需要为我的管道执行的转换列表。我做了一个样本,说明了我所说的多个阶段。

import apache_beam as beam

def myTransform(line):
   line = line * 10
   line = line + 5
   line = line - 2
   return line

class myPTransform(beam.PTransform):
  def expand(self, pcoll):
    # return pcoll | beam.Map(myTransform)
    pcol_output = (pcoll 
                   | beam.Map(lambda line: line * 10)
                   | beam.Map(lambda line: line + 5)
                   | beam.Map(lambda line: line - 2)
    )
    return pcol_output

class mydofunc(beam.DoFn):
   def process(self, element):
    element = element * 10
    element = element + 5
    element = element - 2
    yield element
   
with beam.Pipeline() as p:
    lines = p | beam.Create([1,2,3,4,5])
    
    ### Map Function
    manual = (lines
              | "Map function" >> beam.Map(myTransform)
              | "Print map" >> beam.Map(print))
    
    ### Composite Ptransform
    ptrans = (lines
              | "ptransform call" >> myPTransform()
              | "Print ptransform" >> beam.Map(print))
    
    ### Do Function
    dofnpcol = (lines
              | "Dofn call" >> beam.ParDo(mydofunc())
              | "Print dofnpcol" >> beam.Map(print))
    
    

我应该在什么场景下使用 DoFn 和 Composite Transform? 对于这 3 个选项之间的区别,我可能在这里错过了一个更大的图景。 任何见解都会非常有帮助。

我在 Apache Beam 上看到一个问题:DoFn vs PTransform

python python-3.x google-cloud-dataflow apache-beam
© www.soinside.com 2019 - 2024. All rights reserved.