Esper EPL在1分钟内匹配5对不同字段值的事件

Question

我有一个事件流，定义为：

create schema Event(id string, username string, <additionalFields>)

[<additionalFields>用于构造上下文，但不直接参与其余与模式匹配的EPL（所有EPL语句将在上下文中执行）的地方。

所需的匹配行为是匹配：

在一分钟或更短的时间内发生五对不同用户名的事件。
如果给定用户名在一分钟的时间内发生了两个以上的事件，则出于匹配目的将忽略它们。
如果发生重复事件（具有相同id字段值的事件），则出于匹配目的应将其忽略。
理想情况下，比赛将消耗事件，以使相同的事件无法参与以后的比赛，但是，如果EPL更容易理解，那么我们可以进行后期处理以消除这些重叠。

示例输入事件：

Event={id='e1', username='user1'}
t=t.plus(5 seconds)
Event={id='e2', username='user2'}
t=t.plus(5 seconds)
Event={id='e3', username='user4'}
t=t.plus(5 seconds)
Event={id='e4', username='user3'}
t=t.plus(5 seconds)
Event={id='e5', username='user1'}
t=t.plus(5 seconds)
Event={id='e6', username='user1'}
t=t.plus(5 seconds)
Event={id='e7', username='user1'}
t=t.plus(5 seconds)
Event={id='e8', username='user5'}
t=t.plus(5 seconds)
Event={id='e9', username='user5'}
t=t.plus(5 seconds)
Event={id='e10', username='user4'}
t=t.plus(5 seconds)
Event={id='e11', username='user2'}
t=t.plus(5 seconds)
Event={id='e12', username='user3'}

理想的输出事件：

Event={id='e1', username='user1'}
Event={id='e2', username='user2'}
Event={id='e3', username='user4'}
Event={id='e4', username='user3'}
Event={id='e5', username='user1'}
Event={id='e8', username='user5'}
Event={id='e9', username='user5'}
Event={id='e10', username='user4'}
Event={id='e11', username='user2'}
Event={id='e12', username='user3'}

以下内容对于输出事件也可以接受：

Event={id='e1', username='user1'}
Event={id='e5', username='user1'}
Event={id='e8', username='user5'}
Event={id='e9', username='user5'}
Event={id='e3', username='user4'}
Event={id='e10', username='user4'}
Event={id='e2', username='user2'}
Event={id='e11', username='user2'}
Event={id='e4', username='user3'}
Event={id='e12', username='user3'}

我尝试使用命名窗口：

create window AtMostTwoEventsPerUsername#time(1 minute) as Event;
on Event as e merge AtMostTwoEventsPerUsername as w where w.id = e.id or (select count(*) from AtMostTwoEventsPerUsername where username = e.username) > 1 when not matched then insert select *;
on Event insert into FivePairsOfTwoEventsPerUsername select w.* from AtMostTwoEventsPerUsername as w where w.username in (select username from AtMostTwoEventsPerUsername group by username having count(*) = 2) having count(*) = 10;
on FivePairsOfTwoEventsPerUsername as m delete from AtMostTwoEventsPerUsername as w where w.id = m.id;
@Name("Out") select * from FivePairsOfTwoEventsPerUsername#time(1 minute)#length_batch(10);

并且似乎很接近，但是在匹配事件之后，它需要一个额外的事件，这是不希望的：

Event={id='e1', username='user1'}
t=t.plus(5 seconds)
Event={id='e2', username='user2'}
t=t.plus(5 seconds)
Event={id='e3', username='user4'}
t=t.plus(5 seconds)
Event={id='e4', username='user3'}
t=t.plus(5 seconds)
Event={id='e5', username='user1'}
t=t.plus(5 seconds)
Event={id='e6', username='user1'}
t=t.plus(5 seconds)
Event={id='e7', username='user1'}
t=t.plus(5 seconds)
Event={id='e8', username='user5'}
t=t.plus(5 seconds)
Event={id='e9', username='user5'}
t=t.plus(5 seconds)
Event={id='e10', username='user4'}
t=t.plus(5 seconds)
Event={id='e11', username='user2'}
t=t.plus(5 seconds)
Event={id='e12', username='user3'}
t=t.plus(1 seconds)
Event={id='e13', username='user999'} // this shouldn't be needed to trigger a match

导致所需的输出事件：

FivePairsOfTwoEventsPerUsername={id='e1', username='user1'}
FivePairsOfTwoEventsPerUsername={id='e2', username='user2'}
FivePairsOfTwoEventsPerUsername={id='e3', username='user4'}
FivePairsOfTwoEventsPerUsername={id='e4', username='user3'}
FivePairsOfTwoEventsPerUsername={id='e5', username='user1'}
FivePairsOfTwoEventsPerUsername={id='e8', username='user5'}
FivePairsOfTwoEventsPerUsername={id='e9', username='user5'}
FivePairsOfTwoEventsPerUsername={id='e10', username='user4'}
FivePairsOfTwoEventsPerUsername={id='e11', username='user2'}
FivePairsOfTwoEventsPerUsername={id='e12', username='user3'}

如果从输入事件流中删除了最后一个事件（Event={id='e13', username='user999'}），则“ Out”流意外地没有匹配的事件。

我想理解为什么为什么需要在末尾额外触发事件来触发匹配，以及是否有一组更简单的EPL语句来实现所需的模式匹配。

Answer 1

忽略具有相同id字段值的所有事件，这将意味着记住曾经发生的所有id字段值...对吗？

我将通过使中间流具有一个标志来解决这个问题，该标志指示用户名是“输入”还是“保留”当前不同的用户名集合。我将使用该中间流从命名窗口中添加和删除用户名（及其附加信息）。使用“从计数> x的命名窗口中插入结果流选择窗口（*）”选择最终输出。然后使用“结果流”作为触发器删除命名窗口内容，以便这些用户名消失，避免重叠（命名窗口具有中间流的合并）。

这样，您的解决方案便成为两步设计。第一步产生中间物流。当找到5时，第二个保留中间流以供输出。

Esper EPL在1分钟内匹配5对不同字段值的事件

问题描述投票：0回答：1

1个回答

最新问题

Esper EPL在1分钟内匹配5对不同字段值的事件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1