在配置单元中查找每个ID的平均小时数

问题描述 投票:0回答:2

我的数据集看起来像这样:-

 Id      working_hour
1005    2019-10-23 08:35:00
1006    2019-10-23 00:54:59
1007    2019-10-23 00:24:57
1008    2019-10-23 06:40:00
1009    2019-10-23 03:50:00
1010    2019-10-23 03:25:01
1005    2019-10-24 05:25:00
1006    2019-10-24 01:39:59
1007    2019-10-24 02:30:00
1008    2019-10-24 09:45:01
1010    2019-10-24 07:00:00

这是两天的数据集(23/10/2019和24/10/2019)。我想让ro查找每个ID的平均工作时间(以小时或分钟为单位)。

例如:-

 Id    in_hour  in_min
1005     7       420
1006    1.29    77.4835
sql datetime hadoop hive hiveql
2个回答
0
投票

使用开窗功能。超前和滞后将特别有助于此用例。我没有执行此sql,但是这里有概念。

Select (id, working_ho, nextwH)
from (
Select id, working_hour, lead(working_hour) over partition_by id order_by working hour) nextWH
from tableA)

这将导致看起来像这样的数据。id | working_hour | nextWH

1005 | 2019-10-23 08:35:00 | 2019-10-24 05:25:00

1005 | 2019-10-24 05:25:00 |空

然后过滤出nextWH为空的记录,并使用日期时间函数来根据您的喜好计算Working_hour和nextWH之间的差。

这里是窗口函数文档的链接。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-LEADusingdefault1rowleadandnotspecifyingdefaultvalue


0
投票

您可以尝试以下方法

您的数据以Working_hour作为时间戳

+------------------+----------------------------+--+
| working_hour.id  | working_hour.working_hour  |
+------------------+----------------------------+--+
| 1005             | 2019-10-23 08:35:00.0      |
| 1006             | 2019-10-23 00:54:59.0      |
| 1007             | 2019-10-23 00:24:57.0      |
| 1008             | 2019-10-23 06:40:00.0      |
| 1009             | 2019-10-23 03:50:00.0      |
| 1010             | 2019-10-23 03:25:01.0      |
| 1005             | 2019-10-24 05:25:00.0      |
| 1006             | 2019-10-24 01:39:59.0      |
| 1007             | 2019-10-24 02:30:00.0      |
| 1008             | 2019-10-24 09:45:01.0      |
| 1009             | 2019-10-24 02:10:00.0      |
| 1010             | 2019-10-24 07:00:00.0      |
+------------------+----------------------------+--+

使用开窗功能引导并以秒为单位转换时间戳,以秒为单位计算两个时间戳之间的差异,并以分钟和小时为单位转换秒。

WITH t AS(
SELECT id, working_hour, LEAD(working_hour) OVER(PARTITION BY id ORDER BY working_hour) AS nextDay
FROM working_hour
) SELECT id, working_hour, nextDay, 
         ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 2, 2) AS in_secs, --AVG in seconds
         ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 60 / 2,2) AS in_mins, --AVG in minutes
         ROUND((unix_timestamp(nextDay) - unix_timestamp(working_hour)) / 60 / 60 / 2,2) AS in_hours --AVG in hours
FROM t
WHERE nextDay IS NOT NULL;

和输出

+-------+------------------------+------------------------+----------+----------+-----------+--+
|  id   |      working_hour      |        nextday         | in_secs  | in_mins  | in_hours  |
+-------+------------------------+------------------------+----------+----------+-----------+--+
| 1005  | 2019-10-23 08:35:00.0  | 2019-10-24 05:25:00.0  | 37500.0  | 625.0    | 10.42     |
| 1006  | 2019-10-23 00:54:59.0  | 2019-10-24 01:39:59.0  | 44550.0  | 742.5    | 12.38     |
| 1007  | 2019-10-23 00:24:57.0  | 2019-10-24 02:30:00.0  | 46951.5  | 782.53   | 13.04     |
| 1008  | 2019-10-23 06:40:00.0  | 2019-10-24 09:45:01.0  | 48750.5  | 812.51   | 13.54     |
| 1009  | 2019-10-23 03:50:00.0  | 2019-10-24 02:10:00.0  | 40200.0  | 670.0    | 11.17     |
| 1010  | 2019-10-23 03:25:01.0  | 2019-10-24 07:00:00.0  | 49649.5  | 827.49   | 13.79     |
+-------+------------------------+------------------------+----------+----------+-----------+--+

您也可以采用这种方法

WITH t AS(
SELECT id, working_hour, LEAD(working_hour) OVER(PARTITION BY id ORDER BY working_hour) AS nextDay
FROM working_hour
) SELECT id, working_hour, nextDay, 
          ROUND( ((hour(nextDay) * 60 + minute(nextDay) + hour(working_hour) * 60 + minute(working_hour)) / 60 / 2),2) AS in_hours,
          ROUND( ((hour(nextDay) * 60 + minute(nextDay) + hour(working_hour) * 60 + minute(working_hour)) / 2),2) AS in_mins
FROM t
WHERE nextDay IS NOT NULL;

输出

+-------+------------------------+------------------------+-----------+----------+--+
|  id   |      working_hour      |        nextday         | in_hours  | in_mins  |
+-------+------------------------+------------------------+-----------+----------+--+
| 1005  | 2019-10-23 08:35:00.0  | 2019-10-24 05:25:00.0  | 7.0       | 420.0    |
| 1006  | 2019-10-23 00:54:59.0  | 2019-10-24 01:39:59.0  | 1.28      | 76.5     |
| 1007  | 2019-10-23 00:24:57.0  | 2019-10-24 02:30:00.0  | 1.45      | 87.0     |
| 1008  | 2019-10-23 06:40:00.0  | 2019-10-24 09:45:01.0  | 8.21      | 492.5    |
| 1009  | 2019-10-23 03:50:00.0  | 2019-10-24 02:10:00.0  | 3.0       | 180.0    |
| 1010  | 2019-10-23 03:25:01.0  | 2019-10-24 07:00:00.0  | 5.21      | 312.5    |
+-------+------------------------+------------------------+-----------+----------+--+

我希望有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.