我想询问有什么想法可以避免扫描整个表超过 1 次。 我的问题是我需要统计公司每个部门每年的招聘和离职人数。该数据集为我提供了有关招聘日期和离职日期的数据。我当前的查询通过使用 2 个 CTE 来解决这个问题,1 个用于招聘计数,2 个用于离职计数(每个部门每年都在两个 CTE 中),然后将它们连接起来以显示最终结果。但是,我注意到这种方式会强制查询扫描整个表两次(第一次用于计算总招聘量,第二次用于计算总离职量)。
我想问是否有其他方法可以在一次扫描中计算总招聘人数(通过计算一个部门一年内的招聘人数)和总离职人数。我认为这将有助于大幅提高性能。我使用的 DBMS 是 Postgres
这是我的询问:
WITH hiring_count AS (
SELECT EXTRACT(YEAR FROM effective_department_from_date) AS year,
department,
COUNT(*) AS total_hiring
FROM employee_department
JOIN departments
ON employee_department.department_id = departments.department_id
GROUP BY year, department
), departure_count AS (
SELECT EXTRACT(YEAR FROM effective_department_end_date) AS year,
department,
COUNT(*) AS total_departure
FROM employee_department
JOIN departments
ON employee_department.department_id = departments.department_id
GROUP BY year, department
)
SELECT hiring_count.year,
hiring_count.department,
total_hiring,
total_departure
FROM hiring_count
JOIN departure_count
ON hiring_count.year = departure_count.year
AND hiring_count.department = departure_count.department
grouping sets
:演示
WITH hiring_count AS (
SELECT EXTRACT(YEAR FROM effective_department_from_date) AS year_hired,
EXTRACT(YEAR FROM effective_department_end_date) AS year_departed,
department,
d.department_id,
COUNT(*) AS total
FROM employee_department ed
JOIN departments d
using (department_id)
GROUP BY grouping sets ( (year_hired, d.department_id)
,(year_departed, d.department_id)) )
SELECT coalesce( a.year_hired
,b.year_departed) as year
,a.department
,a.total as total_hiring
,b.total as total_departure
FROM hiring_count a join hiring_count b
using (year_departed,department_id)
order by year,a.department_id
limit 10;
年 | 部门 | 总招聘 | 总出发时间 |
---|---|---|---|
2013 | 部门_0 | 4 | 4 |
2013 | 部门_1 | 9 | 9 |
2013 | 部门_2 | 17 | 17 |
2013 | 部门_3 | 19 | 19 |
2013 | 部门_4 | 11 | 11 |
2013 | 部门_5 | 17 | 17 |
2013 | 部门_6 | 18 | 18 |
2013 | 部门_7 | 11 | 11 |
2013 | 部门_8 | 17 | 17 |
2013 | 部门_9 | 15 | 15 |