我希望有人可以帮助我解决棘手的 Kusto 查询。
我有以下数据集,其中包含用户引发的事件,其中事件用事件代码表示:
datatable(CreatedDate:datetime, User:string, EventCode:string)
[
datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
| order by User asc, CreatedDate asc
我需要旋转此数据,将每个用户的一天表示为一行,这在一行上显示所有完整的事件及其开始和结束日期。
活动代码:
在 TD 和(TC 或 SP)之间可以存在中间事件(不会影响任何内容),或者可能存在诸如 TD、TD、TC 之类的事件序列,其中在这种情况下,仅应考虑最后一个序列。
输出表中的事件应以事件结束码表示(TC或SP)
因此上述数据的预期输出将是这样的:
创建日 | 用户 | 总经过时间 | 活动1 | 活动 1 - 从时间 | 活动 1 - 到时 | 事件 1 - 已过去 | 活动2 | 活动 2 - 从时间开始 | 活动2 - 到时 | 事件 2 - 已过去 |
---|---|---|---|---|---|---|---|---|---|---|
2024-04-23 | 用户1 | 00:02:31 | SP | 4:09:30.551 上午 | 4:09:59.067 上午 | 00:00:28 | TC | 7:10:02.052 上午 | 7:12:05.357 上午 | 00:02:03 |
2024-04-25 | 用户1 | 00:01:54 | TC | 5:11:02.649 上午 | 5:12:56.672 上午 | 00:01:54 | ||||
2024-04-25 | 用户2 | 00:02:13 | TC | 4:36:33.656 上午 | 4:38:46.922 上午 | 00:02:13 | ||||
2024-04-23 | 用户3 | 00:01:14 | TC | 4:13:09.379 上午 | 4:14:23.724 上午 | 00:01:14 | ||||
2024-04-25 | 用户3 | 00:06:48 | TC | 4:34:18.966 上午 | 4:41:07.381 上午 | 00:06:48 |
每天可以为用户发起更多事件(最多可能有 10-15 个事件),这会横向拉伸。 另外请不要介意日期格式,它可以是任何东西,我只是使用了第一个可用的。
以下 KQL 接近您想要实现的目标,可能是其他解决方案可以更好地解决您的问题。 必须完成几个步骤:
这看起来如下:
datatable(CreatedDate:datetime, User:string, EventCode:string)
[
datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
// only some codes are of interest
| where EventCode in ("TD", "TC", "SP")
// TD indicates the start of an event
| extend Event = iff(EventCode =="TD", "start", "end")
| sort by User asc, CreatedDate asc
// 1. calculate the sessions, alternatively take a look at the scan operator
| extend EventStarted = row_window_session(CreatedDate, 1h, 5m, User != prev(User))
| extend day=bin(EventStarted, 1d)
| summarize EventEnded=max(CreatedDate) by day, User, EventStarted
| sort by day, User asc , EventEnded asc
// count events per user and day
| extend event=row_number(1, prev(User) != User)
| extend Duration= EventEnded-EventStarted, event=strcat("Event", tostring(event))
// 2. create a dynamic property bag for the events
| extend bag=bag_pack("Event", event, "From", tostring(EventStarted), "To", tostring(EventEnded), "Duration", tostring(Duration))
| summarize take_any(bag) by User, day, event
// 3. pivot the events
| evaluate pivot(event, take_any(bag))
这样,您将获得列 Event1、Event2...,其中事件内容位于属性包中。
您可以使用 bag_unpack 函数将内容转换为列,但这只有在您知道事件数量的情况下才有效:
| evaluate bag_unpack(Event1, "Event1")
| evaluate bag_unpack(Event2, "Event2")