如何等待有限的流批量结果

问题描述 投票:0回答:1

我有一个使用Spring Cloud Streams和Kafka Streams构建的流处理应用程序,该系统从应用程序中获取日志,并将其与另一个流处理器的观察结果进行比较并产生一个分数,然后将日志流除以该分数(高于和低于某个阈值)。

拓扑:

enter image description here

问题:

所以我的问题是如何正确实施“对数最佳观察选择器处理器”,在处理日志时,观测值有限,但是可能有很多观测值。

所以我想出了两种解决方案...

  1. Group&Window通过日志ID获得日志评分的主题,然后减少以获取最高分。 (问题:对所有观察结果评分可能要花费比窗口更长的时间)

  2. 在每次计分后发送一个计分完成的消息,与日志相关观察合并,使用日志计分观察全局表和交互式查询来检查所有观察ID是否都在全局表存储中(当所有ID都在其中时)商店会映射到得分最高的观察值。 (问题:仅用于交互式查询时,全局表似乎不起作用)

实现我正在尝试的最佳方法是什么?

  • 我希望不创建任何分区,磁盘或内存瓶颈。

  • 从对数和观察值中加入该值时,每个事物都有唯一的ID和相关ID的元组。

((编辑:带有图和更改标题的拓扑的切换文本描述)

apache-kafka apache-kafka-streams spring-cloud-stream
1个回答
0
投票

解决方案2似乎可以工作,但是它发出警告,因为交互式查询需要一些时间才能准备好-所以我用Transformer实现了相同的解决方案:

@Slf4j
@Configuration
@RequiredArgsConstructor
@SuppressWarnings("unchecked")
public class LogBestObservationsSelectorProcessorConfig {
    private String logScoredObservationsStore = "log-scored-observations-store";

    private final Serde<LogEntryRelevantObservationIdTuple> logEntryRelevantObservationIdTupleSerde;
    private final Serde<LogRelevantObservationIdsTuple> logRelevantObservationIdsTupleSerde;
    private final Serde<LogEntryObservationMatchTuple> logEntryObservationMatchTupleSerde;
    private final Serde<LogEntryObservationMatchIdsRelevantObservationsTuple> logEntryObservationMatchIdsRelevantObservationsTupleSerde;

    @Bean
    public Function<
            GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>,
                Function<
                    KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple>,
                    Function<
                            KTable<String, LogRelevantObservationIds>,
                            KStream<String, LogEntryObservationMatchTuple>
                    >
                >
            >
    logBestObservationSelectorProcessor() {
        return (GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> logScoredObservationsTable) ->
                (KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple> logScoredObservationProcessedStream) ->
                        (KTable<String, LogRelevantObservationIdsTuple> logRelevantObservationIdsTable) -> {
            return logScoredObservationProcessedStream
                    .selectKey((k, v) -> k.getLogId())
                    .leftJoin(
                            logRelevantObservationIdsTable,
                            LogEntryObservationMatchIdsRelevantObservationsTuple::new,
                            Joined.with(
                                    Serdes.String(),
                                    logEntryRelevantObservationIdTupleSerde,
                                    logRelevantObservationIdsTupleSerde
                            )
                    )
                    .transform(() -> new LogEntryObservationMatchTransformer(logScoredObservationsStore))
                    .groupByKey(
                            Grouped.with(
                                Serdes.String(),
                                logEntryObservationMatchTupleSerde
                            )
                    )
                    .reduce(
                            (match1, match2) -> Double.compare(match1.getScore(), match2.getScore()) != -1 ? match1 : match2,
                            Materialized.with(
                                    Serdes.String(),
                                    logEntryObservationMatchTupleSerde
                            )
                    )
                    .toStream()
                    ;
        };
    }

    @RequiredArgsConstructor
    private static class LogEntryObservationMatchTransformer implements Transformer<String, LogEntryObservationMatchIdsRelevantObservationsTuple, KeyValue<String, LogEntryObservationMatchTuple>> {
        private final String stateStoreName;
        private ProcessorContext context;
        private TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> kvStore;

        @Override
        public void init(ProcessorContext context) {
            this.context = context;
            this.kvStore = (TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>) context.getStateStore(stateStoreName);
        }

        @Override
        public KeyValue<String, LogEntryObservationMatchTuple> transform(String logId, LogEntryObservationMatchIdsRelevantObservationsTuple value) {
            val observationIds = value.getLogEntryRelevantObservationsTuple().getRelevantObservations().getObservationIds();
            val allObservationsProcessed = observationIds.stream()
                    .allMatch((observationId) -> {
                        val key = LogEntryRelevantObservationIdTuple.newBuilder()
                                .setLogId(logId)
                                .setRelevantObservationId(observationId)
                                .build();
                        return kvStore.get(key) != null;
                    });
            if (!allObservationsProcessed) {
                return null;
            }

            val observationId = value.getLogEntryRelevantObservationIdTuple().getObservationId();
            val key = LogEntryRelevantObservationIdTuple.newBuilder()
                    .setLogId(logId)
                    .setRelevantObservationId(observationId)
                    .build();
            ValueAndTimestamp<LogEntryObservationMatchTuple> observationMatchValueAndTimestamp = kvStore.get(key);
            return new KeyValue<>(logId, observationMatchValueAndTimestamp.value());
        }

        @Override
        public void close() {

        }
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.