我的数据集看起来像这样:-
Id working_hour
1005 2019-10-23 08:35:00
1006 2019-10-23 00:54:59
1007 2019-10-23 00:24:57
1008 2019-10-23 06:40:00
1009 2019-10-23 03:50:00
1010 2019-10-23 03:25:01
1005 2019-10-24 05:25:00
1006 2019-10-24 01:39:59
1007 2019-10-24 02:30:00
1008 2019-10-24 09:45:01
1010 2019-10-24 07:00:00
这是两天的数据集(23/10/2019和24/10/2019)。我想让ro查找每个ID的平均工作时间(以小时或分钟为单位)。
例如:-
Id in_hour in_min
1005 7 420
1006 1.29 77.4835
使用开窗功能。超前和滞后将特别有助于此用例。我没有执行此sql,但是这里有概念。
Select (id, working_ho, nextwH)
from (
Select id, working_hour, lead(working_hour) over partition_by id order_by working hour) nextWH
from tableA)
这将导致看起来像这样的数据。id | working_hour | nextWH
1005 | 2019-10-23 08:35:00 | 2019-10-24 05:25:00
1005 | 2019-10-24 05:25:00 |空
然后过滤出nextWH为空的记录,并使用日期时间函数来根据您的喜好计算Working_hour和nextWH之间的差。
这里是窗口函数文档的链接。
您可以尝试以下方法
您的数据以Working_hour作为时间戳
+------------------+----------------------------+--+
| working_hour.id | working_hour.working_hour |
+------------------+----------------------------+--+
| 1005 | 2019-10-23 08:35:00.0 |
| 1006 | 2019-10-23 00:54:59.0 |
| 1007 | 2019-10-23 00:24:57.0 |
| 1008 | 2019-10-23 06:40:00.0 |
| 1009 | 2019-10-23 03:50:00.0 |
| 1010 | 2019-10-23 03:25:01.0 |
| 1005 | 2019-10-24 05:25:00.0 |
| 1006 | 2019-10-24 01:39:59.0 |
| 1007 | 2019-10-24 02:30:00.0 |
| 1008 | 2019-10-24 09:45:01.0 |
| 1009 | 2019-10-24 02:10:00.0 |
| 1010 | 2019-10-24 07:00:00.0 |
+------------------+----------------------------+--+
使用开窗功能引导并以秒为单位转换时间戳,以秒为单位计算两个时间戳之间的差异,并以分钟和小时为单位转换秒。
WITH t AS(
SELECT id, working_hour, LEAD(working_hour) OVER(PARTITION BY id ORDER BY working_hour) AS nextDay
FROM working_hour
) SELECT id, working_hour, nextDay,
ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 2, 2) AS in_secs, --AVG in seconds
ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 60 / 2,2) AS in_mins, --AVG in minutes
ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 60 / 60 / 2,2) AS in_hours --AVG in hours
FROM t
WHERE nextDay IS NOT NULL;
和输出
+-------+------------------------+------------------------+----------+----------+-----------+--+
| id | working_hour | nextday | in_secs | in_mins | in_hours |
+-------+------------------------+------------------------+----------+----------+-----------+--+
| 1005 | 2019-10-23 08:35:00.0 | 2019-10-24 05:25:00.0 | 37500.0 | 625.0 | 10.42 |
| 1006 | 2019-10-23 00:54:59.0 | 2019-10-24 01:39:59.0 | 44550.0 | 742.5 | 12.38 |
| 1007 | 2019-10-23 00:24:57.0 | 2019-10-24 02:30:00.0 | 46951.5 | 782.53 | 13.04 |
| 1008 | 2019-10-23 06:40:00.0 | 2019-10-24 09:45:01.0 | 48750.5 | 812.51 | 13.54 |
| 1009 | 2019-10-23 03:50:00.0 | 2019-10-24 02:10:00.0 | 40200.0 | 670.0 | 11.17 |
| 1010 | 2019-10-23 03:25:01.0 | 2019-10-24 07:00:00.0 | 49649.5 | 827.49 | 13.79 |
+-------+------------------------+------------------------+----------+----------+-----------+--+
您也可以采用这种方法
WITH t AS(
SELECT id, working_hour, LEAD(working_hour) OVER(PARTITION BY id ORDER BY working_hour) AS nextDay
FROM working_hour
) SELECT id, working_hour, nextDay,
ROUND( ((hour(nextDay) * 60 + minute(nextDay) + hour(working_hour) * 60 + minute(working_hour)) / 60 / 2),2) AS in_hours,
ROUND( ((hour(nextDay) * 60 + minute(nextDay) + hour(working_hour) * 60 + minute(working_hour)) / 2),2) AS in_mins
FROM t
WHERE nextDay IS NOT NULL;
输出
+-------+------------------------+------------------------+-----------+----------+--+
| id | working_hour | nextday | in_hours | in_mins |
+-------+------------------------+------------------------+-----------+----------+--+
| 1005 | 2019-10-23 08:35:00.0 | 2019-10-24 05:25:00.0 | 7.0 | 420.0 |
| 1006 | 2019-10-23 00:54:59.0 | 2019-10-24 01:39:59.0 | 1.28 | 76.5 |
| 1007 | 2019-10-23 00:24:57.0 | 2019-10-24 02:30:00.0 | 1.45 | 87.0 |
| 1008 | 2019-10-23 06:40:00.0 | 2019-10-24 09:45:01.0 | 8.21 | 492.5 |
| 1009 | 2019-10-23 03:50:00.0 | 2019-10-24 02:10:00.0 | 3.0 | 180.0 |
| 1010 | 2019-10-23 03:25:01.0 | 2019-10-24 07:00:00.0 | 5.21 | 312.5 |
+-------+------------------------+------------------------+-----------+----------+--+
我希望有帮助。