我有一个用户的表历史任务,看起来像这样。 1 个用户有 9 个任务。
user_id task IS_COMPLETED updated_at
123 Task 1 1 2024-01-01
123 Task 2 1 2024-01-01
123 Task 3 0 2024-01-01
123 Task 4 0 2024-01-01
123 Task 5 0 2024-01-01
123 Task 6 1 2024-01-01
123 Task 7 1 2024-01-01
123 Task 8 1 2024-01-01
123 Task 9 1 2024-01-01
我想对任务的可能组合进行分组,成为这样的表格。所以我知道什么样的任务组合是用户快速完成的。
combination total_user_completed
task1_task2 20
task1_task3 15
task1_task_4 15
task1_task_5 14
: : (and so on for 2 combination)
task1_task2_task3 10
task4_task5_task_6 11
: : (and so on for 3 combination)
(有关组合示例的更多详细信息:从A,B,C我得到A-B,B-C,A-C和A-B-C之间的组合。不需要按顺序)
我尝试过递归sql,但效果不佳
“递归连接内存不足。请在 更大的仓库”
然后我也尝试用case when来做,但是不太理想,需要像这样一一定义。
with all_task as (
select
distinct user_id,
task_name as task,
is_completed,
updated_at
from task
where
is_completed = 1
group by all
),
agg as (
select
distinct user_id,
activated_at as period,
max(case when task in ('Task 1') and IS_COMPLETED = 1 then date(updated_at) end) as task_1,
max(case when task in ('Task 2') and IS_COMPLETED = 1 then date(updated_at) end) as task_2,
max(case when task in ('Task 3') and IS_COMPLETED = 1 then date(updated_at) end) as task_3,
max(case when task in ('Task 4') and IS_COMPLETED = 1 then date(updated_at) end) as task_4,
max(case when task in ('Task 5') and IS_COMPLETED = 1 then date(updated_at) end) as task_5,
max(case when task in ('Task 6') and IS_COMPLETED = 1 then date(updated_at) end) as task_6,
max(case when task in ('Task 7') and IS_COMPLETED = 1 then date(updated_at) end) as task_7,
max(case when task in ('Task 8') and IS_COMPLETED = 1 then date(updated_at) end) as task_8,
max(case when task in ('Task 9') and IS_COMPLETED = 1 then date(updated_at) end) as task_9
from all_task
group by all
),
task_group as (
select user_id,
case when task_1 is not null and task_2 is not null then user_id end as task_1_2
case when task_2 not null and task_3 is not null then user_id end as task_2_3
.........
from agg
)
select 'Task1_task2' as combination,
count(distinct task_1_2) as total
from task_group
union all
select 'task2_task3' as combination,
count(distinct task_2_3) as total
from task_group
.... (and so on)
有什么建议可以解决这个问题吗?非常感谢!
*请注意,现在我至少需要 2/3 的任务组合。谢谢你
对于 MS SQL Server 2016:(应该适用于 Snowflake)
我想我有办法得到你想要的东西。它需要手动指定排列数量,并且不会创建所有可能的排列。这也会创建一个相当大的表,随着任务数量的增加呈指数级增长。
这里我有一个针对任务列的 3 种排列的可行解决方案。您可以通过添加代码来添加更多内容。
它的工作原理是获取
DISTINCT
任务和 CROSS JOINING
这些任务,同时检查排列是否存在多次。然后我们得到已完成任务的数量和LEFT JOIN
这些排列,以获得每个排列完成的任务数量
-- Create a temporary table
CREATE TABLE #TempTable
(
user_id INT,
task VARCHAR(50),
IS_COMPLETED BIT,
updated_at DATE
);
-- Insert data into the temporary table
INSERT INTO #TempTable
(
user_id,
task,
IS_COMPLETED,
updated_at
)
VALUES
(123, 'Task 1', 1, '2024-01-01'),
(123, 'Task 2', 1, '2024-01-01'),
(123, 'Task 3', 0, '2024-01-01'),
(123, 'Task 4', 0, '2024-01-01'),
(123, 'Task 5', 0, '2024-01-01'),
(123, 'Task 6', 1, '2024-01-01'),
(123, 'Task 7', 1, '2024-01-01'),
(123, 'Task 8', 1, '2024-01-01'),
(123, 'Task 9', 1, '2024-01-01');
;with cteAllColumns
as (select DISTINCT
Task as col
from #TempTable
)
-- See commented code for adding more permutations
select c1.col as 'c1.task',
c2.col as 'c2.task',
c3.col as 'c3.task',
--c4.col as 'c4.task',
ISNULL(SUM(result.completed), 0) as 'combination completed'
from cteAllColumns c1
cross join cteAllColumns c2
cross join cteAllColumns c3
--cross join cteAllColumns c4
LEFT JOIN
(
SELECT task,
COUNT(task) as 'completed'
FROM #TempTable
WHERE IS_COMPLETED = 1
GROUP BY task
) result
ON result.task = c1.col
OR result.task = c2.col
OR result.task = c3.col
--OR result.task = c4.col
where c1.col < c2.col
AND c2.col < c3.col
--AND c3.col < c4.col
GROUP BY c1.col,
c2.col,
c3.col --,c4.col
ORDER BY c1.col,
c2.col,
c3.col --,c4.col
DROP TABLE #TempTable;