Flink - keyBy() 函数是否在内部对 flink 运算符进行分区?

问题描述 投票:0回答:1

以下代码取自文档

package spendreport;

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.walkthrough.common.sink.AlertSink;
import org.apache.flink.walkthrough.common.entity.Alert;
import org.apache.flink.walkthrough.common.entity.Transaction;
import org.apache.flink.walkthrough.common.source.TransactionSource;

public class FraudDetectionJob {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<Transaction> transactions = env
            .addSource(new TransactionSource())
            .name("transactions");
        
        DataStream<Alert> alerts = transactions
            .keyBy(Transaction::getAccountId)
            .process(new FraudDetector())
            .name("fraud-detector");

        alerts
            .addSink(new AlertSink())
            .name("send-alerts");

        env.execute("Fraud Detection");
    }
}

.keyBy(Transaction::getAccountId)
函数应用于算子(
alerts
),是否在flink算子(
alerts
)处对传入数据进行分区(基于账户ID),如下图所示?

2) 如果是,如何编写映射器来处理从每个分区流式传输的数据?使用

.keyBy(Transaction::getAccountId)


java stream apache-flink flink-streaming
1个回答
0
投票

keyBy
应用于数据流
transactions

应用

keyBy
后,
transactions
中具有相同
account ID
的记录将位于同一分区中,并且您可以应用
KeyedStream
中的功能,例如
process
(不推荐,因为它被标记为已弃用),
 window
reduce
min/max/sum

有很多使用 keyBy 的例子,例如这个链接

© www.soinside.com 2019 - 2024. All rights reserved.