Kusto (KQL),如何透视分组为单行(按用户和日期)的事件行,包括事件开始和结束时间?

问题描述 投票:0回答:1

我希望有人可以帮助我解决棘手的 Kusto 查询。

我有以下数据集,其中包含用户引发的事件,其中事件用事件代码表示:

datatable(CreatedDate:datetime, User:string, EventCode:string)
[
    datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
    datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
    datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
    datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
    datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
    datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
    datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
    datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
    datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
    datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
    datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
    datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
    datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
    datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
    datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
    datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
| order by User asc, CreatedDate asc

我需要旋转此数据,将每个用户的一天表示为一行,这在一行上显示所有完整的事件及其开始和结束日期。

活动代码:

  • TD - 活动开始代码
  • TCSP - 活动结束代码
  • TS 和其余代码 - 可以忽略

TD 和(TCSP)之间可以存在中间事件(不会影响任何内容),或者可能存在诸如 TDTDTC 之类的事件序列,其中在这种情况下,仅应考虑最后一个序列。

输出表中的事件应以事件结束码表示(TCSP

因此上述数据的预期输出将是这样的:

创建日 用户 总经过时间 活动1 活动 1 - 从时间 活动 1 - 到时 事件 1 - 已过去 活动2 活动 2 - 从时间开始 活动2 - 到时 事件 2 - 已过去
2024-04-23 用户1 00:02:31 SP 4:09:30.551 上午 4:09:59.067 上午 00:00:28 TC 7:10:02.052 上午 7:12:05.357 上午 00:02:03
2024-04-25 用户1 00:01:54 TC 5:11:02.649 上午 5:12:56.672 上午 00:01:54
2024-04-25 用户2 00:02:13 TC 4:36:33.656 上午 4:38:46.922 上午 00:02:13
2024-04-23 用户3 00:01:14 TC 4:13:09.379 上午 4:14:23.724 上午 00:01:14
2024-04-25 用户3 00:06:48 TC 4:34:18.966 上午 4:41:07.381 上午 00:06:48

每天可以为用户发起更多事件(最多可能有 10-15 个事件),这会横向拉伸。 另外请不要介意日期格式,它可以是任何东西,我只是使用了第一个可用的。

azure kql azure-data-explorer kusto-explorer
1个回答
0
投票

以下 KQL 接近您想要实现的目标,可能是其他解决方案可以更好地解决您的问题。 必须完成几个步骤:

  1. 识别事件。为此,KQL 提供了多种选择。经常使用的是 row_window_sessionscan-operator
  2. 使用 bag_pack 函数将事件信息放入属性包中
  3. 使用 pivot-plugin
  4. 旋转事件

这看起来如下:

    datatable(CreatedDate:datetime, User:string, EventCode:string)
[
    datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
    datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
    datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
    datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
    datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
    datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
    datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
    datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
    datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
    datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
    datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
    datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
    datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
    datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
    datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
    datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
// only some codes are of interest
| where  EventCode in ("TD", "TC", "SP")
// TD indicates the start of an event
| extend Event = iff(EventCode =="TD", "start", "end")
| sort by User asc, CreatedDate asc
// 1. calculate the sessions, alternatively take a look at the scan operator
| extend EventStarted = row_window_session(CreatedDate, 1h, 5m, User != prev(User))
| extend  day=bin(EventStarted, 1d)
| summarize EventEnded=max(CreatedDate) by day, User,  EventStarted
| sort by day, User asc , EventEnded asc 
// count events per user and day
| extend event=row_number(1, prev(User) != User)
| extend Duration= EventEnded-EventStarted, event=strcat("Event", tostring(event))
// 2. create a dynamic property bag for the events
| extend bag=bag_pack("Event", event, "From", tostring(EventStarted), "To", tostring(EventEnded), "Duration", tostring(Duration)) 
| summarize take_any(bag) by User, day, event
// 3. pivot the events
| evaluate pivot(event, take_any(bag))

这样,您将获得列 Event1、Event2...,其中事件内容位于属性包中。

enter image description here

您可以使用 bag_unpack 函数将内容转换为列,但这只有在您知道事件数量的情况下才有效:

| evaluate bag_unpack(Event1, "Event1") 
| evaluate bag_unpack(Event2, "Event2")
© www.soinside.com 2019 - 2024. All rights reserved.