Google Cloud Dataflow BigQuery 到 Bigtable 传输 - 限制写入速度？

Question

我有许多数据流模板可将数据从 BigQuery 复制到 Bigtable 表。

其中最大的约为 900 万行，价值 22GB 的数据。

没有复杂的突变，它只是一个副本。

我注意到，在运行 Dataflow 模板时，Bigtable 实例的 CPU 峰值达到 100%，并且读/写延迟非常慢。即使只有 1 个工作线程/没有对线程进行自定义，也会发生这种情况。

我尝试过调整工作线程的数量并限制 numberOfWorkerHarnessThreads，但一直无法找到一种组合，该组合仍能在合理的时间内加载数据并且不会使 Bigtable 实例出现峰值。

管道

        BigQueryBigtableTransferOptions options =
            PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(BigQueryBigtableTransferOptions.class);

        CloudBigtableTableConfiguration config =
            new CloudBigtableTableConfiguration.Builder()
                .withProjectId(options.getBigtableProjectId())
                .withInstanceId(options.getBigtableInstanceId())
                .withTableId(options.getBigtableTableId())
                .build();

        Pipeline p = Pipeline.create(options);

        p.apply(BigQueryIO.readTableRows().withoutValidation().fromQuery(options.getBqQuery())
                .usingStandardSql())
            .apply(ParDo.of(new Transform(options.getBigtableRowKey())))
            .apply(CloudBigtableIO.writeToTable(config));

        p.run();

BigQuery 查询只是一个 select * 查询，Transform 操作只是为 BQ 中的每一列向 Bigtable 行添加一列，没有额外的逻辑。

Answer 1

Biggable集群的节点扩展是多少？当前最小节点数是多少？

通过增加节点数量来扩展工作负载。当您突然出现批量工作负载时，负载下可能需要 20 分钟，集群性能才会出现显着改善。

如果集群设置为自动缩放，请设置最小节点数，以便集群不会缩小太多。

如果设置为手动，请在工作负载增加之前至少 20 分钟添加更多节点。

Google Cloud Dataflow BigQuery 到 Bigtable 传输 - 限制写入速度？

问题描述投票：0回答：1

1个回答

最新问题

Google Cloud Dataflow BigQuery 到 Bigtable 传输 - 限制写入速度？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1