输出具有空值的TableRow时出现NullPointerException

问题描述 投票:0回答:3

我正在尝试构建一个TableRow对象,最终写入BigQuery表,但如果我在行中包含NullPointerException值,我会得到一个null。这是完整的堆栈跟踪:

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at dataflowsandbox.StarterPipeline.runTest(StarterPipeline.java:224)
    at dataflowsandbox.StarterPipeline.main(StarterPipeline.java:83)
Caused by: java.lang.NullPointerException
    at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
    at java.util.AbstractMap.hashCode(AbstractMap.java:530)
    at java.util.Arrays.hashCode(Arrays.java:4146)
    at java.util.Objects.hash(Objects.java:128)
    at org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow.hashCode(WindowedValue.java:245)
    at java.util.HashMap.hash(HashMap.java:339)
    at java.util.HashMap.get(HashMap.java:557)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:191)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.HashMultimap.put(HashMultimap.java:48)
    at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:111)
    at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:242)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:69)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
    at dataflowsandbox.StarterPipeline$6.procesElement(StarterPipeline.java:202)

Process finished with exit code 1

这是触发NullPointerException的代码:

  Pipeline p = Pipeline.create( options );

  p.apply( "kicker", Create.of( "Kick!" ) )
  .apply( "Read values", ParDo.of( new DoFn<String, TableRow>() {
     @ProcessElement
     public void procesElement( ProcessContext c ) {

        TableRow row = new TableRow();

        row.set( "ev_id",       "2323423423" );
        row.set( "customer_id", "111111"     );
        row.set( "org_id",      null         ); // Without this line, no NPE
        c.output( row );  


     } }) )
     .apply( BigQueryIO.writeTableRows()
        .to( DATA_TABLE_OUT )
        .withCreateDisposition( CREATE_NEVER )
        .withWriteDisposition( WRITE_APPEND ) );

  PipelineResult result = p.run();

我的实际代码有点复杂,但我应该能够捕获null值而不是在行中设置它,但也许我对TableRows不了解。

java google-cloud-dataflow apache-beam
3个回答
1
投票

例如,您可以提供表模式,只是省略设置字段的值。

表模式,其中org_idNULLABLE

List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("ev_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("customer_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("org_id").setType("STRING").setMode("NULLABLE"));
TableSchema schema = new TableSchema().setFields(fields);

只是不要为该字段设置任何值(注释掉该行):

row.set( "ev_id",       "2323423423" );
row.set( "customer_id", "111111"     );
// row.set( "org_id",     None         ); // Without this line, no NPE
c.output( row );  

在写入步骤中传递表模式:

.apply( BigQueryIO.writeTableRows()
   .to( DATA_TABLE_OUT )
   .withSchema(schema)
   .withCreateDisposition( CREATE_NEVER )
   .withWriteDisposition( WRITE_APPEND ) );

NULL值写入BigQuery:

enter image description here


2
投票

如果您使用的是DirectRunner,请使用参数--enforceImmutability = false。它对我有用。此问题已由Dataflow Runner处理,但是当使用DirectRunner时,如果将null传递给tableRow.set(),则会遇到NPE。如果我们通过设置--enforceImmutability = false管道选项来关闭DirectRunner的ImmutabilityEnforcement检查,则不再显示错误。

参考:https://issues.apache.org/jira/browse/BEAM-1714


0
投票

放置一个临时值而不是null或空字符串。据我所知,表格不接受空值。

© www.soinside.com 2019 - 2024. All rights reserved.