优化数据帧操作

问题描述 投票:0回答:1

我有包含团队处理的任务详细信息的数据集;对于每个任务,都会保存不同的属性。根据这个数据集,我想找到员工在关闭当前任务后选择一项新任务的时间。数据帧具有以下结构(实际数据帧以百万行为单位):

以下相关内容:

a) 每个员工都可以执行来自某些国家和职能部门的任务。这些存储在“国家”和“功能”列中。所以我需要从最后一个“国家”和“职能”值中选择员工选择的下一个任务

b) 用户将任务添加到队列中,员工在完成最后一个任务后从中选择新任务。

我想在以下条件下找到每行的 time_assigned 值(为方便起见,称为参考行):

a) 对于同一员工(employee_id 与参考行中的相同)

b) 参考行中高于 time_close 的最低 time_assigned 值

c) 国家代码出现在 d) 参考行的国家列中

d) 函数出现在参考行的函数列中

目前我正在使用以下代码

data={'task_id' : ['request_1001', 'request_1002', 'request_1003', 'request_1004', 'request_1005', 'request_1006', 'request_1007', 'request_1008', 'request_1009', 'request_1010', 'request_1011', 'request_1012', 'request_1013', 'request_1014', 'request_1015', 'request_1016', 'request_1017', 'request_1018', 'request_1019', 'request_1020', 'request_1021', 'request_1022', 'request_1023', 'request_1024', 'request_1025', 'request_1026', 'request_1027', 'request_1028', 'request_1029', 'request_1030', 'request_1031', 'request_1032', 'request_1033', 'request_1034', 'request_1035', 'request_1036', 'request_1037', 'request_1038', 'request_1039', 'request_1040', 'request_1041', 'request_1042', 'request_1043', 'request_1044', 'request_1045', 'request_1046', 'request_1047', 'request_1048', 'request_1049', 'request_1050', 'request_1051', 'request_1052', 'request_1053', 'request_1054', 'request_1055', 'request_1056', 'request_1057', 'request_1058', 'request_1059', 'request_1060', 'request_1061', 'request_1062', 'request_1063', 'request_1064', 'request_1065', 'request_1066', 'request_1067', 'request_1068', 'request_1069', 'request_1070', 'request_1071', 'request_1072', 'request_1073', 'request_1074', 'request_1075', 'request_1076', 'request_1077', 'request_1078', 'request_1079', 'request_1080', 'request_1081', 'request_1082', 'request_1083', 'request_1084', 'request_1085', 'request_1086', 'request_1087', 'request_1088', 'request_1089', 'request_1090', 'request_1091', 'request_1092', 'request_1093', 'request_1094', 'request_1095', 'request_1096', 'request_1097', 'request_1098', 'request_1099', 'request_1100'],
'function' : ['function_2', 'function_2', 'function_2', 'function_4', 'function_2', 'function_7', 'function_2', 'function_2', 'function_2', 'function_7', 'function_7', 'function_2', 'function_2', 'function_7', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_6', 'function_2', 'function_2', 'function_6', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_7', 'function_2', 'function_2', 'function_7', 'function_2', 'function_5', 'function_7', 'function_5', 'function_5', 'function_2', 'function_5', 'function_2', 'function_2', 'function_7', 'function_2', 'function_2', 'function_2', 'function_7', 'function_2', 'function_7', 'function_2', 'function_7', 'function_2', 'function_2', 'function_2', 'function_7', 'function_7', 'function_2', 'function_7', 'function_2', 'function_7', 'function_7', 'function_7', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_2', 'function_7', 'function_2', 'function_7', 'function_2', 'function_1', 'function_5', 'function_2', 'function_2', 'function_7', 'function_2', 'function_2', 'function_2', 'function_4', 'function_7', 'function_7', 'function_6', 'function_2', 'function_4', 'function_7', 'function_2', 'function_5', 'function_7', 'function_1', 'function_2', 'function_5', 'function_5', 'function_5', 'function_3', 'function_3', 'function_3', 'function_3'],
'country_code' : ['country3', 'country1', 'country3', 'country1', 'country1', 'country1', 'country1', 'country1', 'country1', 'country1', 'country1', 'country3', 'country1', 'country1', 'country9', 'country1', 'country1', 'country1', 'country2', 'country1', 'country1', 'country5', 'country1', 'country5', 'country5', 'country12', 'country9', 'country3', 'country2', 'country3', 'country1', 'country1', 'country3', 'country1', 'country9', 'country3', 'country1', 'country2', 'country1', 'country9', 'country1', 'country3', 'country9', 'country1', 'country9', 'country1', 'country1', 'country1', 'country3', 'country1', 'country3', 'country1', 'country3', 'country4', 'country1', 'country1', 'country1', 'country1', 'country1', 'country3', 'country1', 'country1', 'country1', 'country1', 'country1', 'country3', 'country1', 'country1', 'country3', 'country1', 'country1', 'country1', 'country1', 'country6', 'country1', 'country1', 'country9', 'country1', 'country1', 'country1', 'country3', 'country1', 'country1', 'country1', 'country1', 'country1', 'country1', 'country1', 'country9', 'country1', 'country1', 'country6', 'country14', 'country4', 'country10', 'country10', 'country14', 'country1', 'country1', 'country1'],
'employee_id' : ['emp1', 'emp2', 'emp3', 'emp4', 'emp5', 'emp6', 'emp7', 'emp2', 'emp1', 'emp6', 'emp8', 'emp3', 'emp1', 'emp6', 'emp8', 'emp2', 'emp7', 'emp4', 'emp8', 'emp5', 'emp8', 'emp4', 'emp1', 'emp9', 'emp7', 'emp4', 'emp5', 'emp1', 'emp8', 'emp2', 'emp8', 'emp4', 'emp2', 'emp6', 'emp4', 'emp10', 'emp6', 'emp10', 'emp10', 'emp9', 'emp10', 'emp4', 'emp9', 'emp8', 'emp5', 'emp8', 'emp3', 'emp6', 'emp3', 'emp6', 'emp2', 'emp6', 'emp1', 'emp2', 'emp3', 'emp6', 'emp6', 'emp2', 'emp6', 'emp1', 'emp6', 'emp6', 'emp6', 'emp3', 'emp8', 'emp1', 'emp8', 'emp8', 'emp2', 'emp6', 'emp3', 'emp6', 'emp8', 'emp1', 'emp10', 'emp1', 'emp8', 'emp6', 'emp8', 'emp2', 'emp1', 'emp2', 'emp6', 'emp8', 'emp8', 'emp8', 'emp2', 'emp6', 'emp8', 'emp10', 'emp6', 'emp1', 'emp11', 'emp12', 'emp13', 'emp13', 'emp14', 'emp14', 'emp14', 'emp14'],
'time_created' : ['2023-06-17 20:14:17.311', '2023-06-17 20:18:12.232', '2023-06-17 20:21:49.917', '2023-06-17 20:24:02.617', '2023-06-17 20:26:50.739', '2023-06-17 20:26:43.510', '2023-06-17 20:27:38.979', '2023-06-17 20:28:40.599', '2023-06-17 20:30:57.405', '2023-06-17 20:30:38.544', '2023-06-17 20:40:23.902', '2023-06-17 20:42:00.869', '2023-06-17 20:39:44.346', '2023-06-17 20:43:40.614', '2023-06-17 20:38:46.621', '2023-06-17 20:45:46.476', '2023-06-17 20:46:07.558', '2023-06-17 20:55:13.492', '2023-06-17 20:56:01.564', '2023-06-17 21:00:49.596', '2023-06-17 20:57:31.688', '2023-06-17 21:02:03.657', '2023-06-17 21:02:19.180', '2023-06-17 21:00:50.673', '2023-06-17 21:08:32.449', '2023-06-17 21:08:22.179', '2023-06-17 21:15:18.588', '2023-06-17 21:21:47.750', '2023-06-17 21:25:11.048', '2023-06-17 21:25:48.403', '2023-06-17 21:27:56.747', '2023-06-17 21:32:05.291', '2023-06-17 21:34:34.751', '2023-06-17 21:37:03.961', '2023-06-17 21:41:01.666', '2023-06-17 17:57:29.438', '2023-06-17 21:38:16.714', '2023-06-17 18:05:29.199', '2023-06-17 18:19:04.931', '2023-06-17 21:41:01.773', '2023-06-17 20:28:33.795', '2023-06-17 21:45:43.000', '2023-06-17 21:45:18.882', '2023-06-17 21:50:28.857', '2023-06-17 21:54:48.772', '2023-06-17 21:54:54.044', '2023-06-17 21:55:10.374', '2023-06-17 22:03:00.272', '2023-06-17 22:06:05.007', '2023-06-17 22:04:44.732', '2023-06-17 22:06:07.945', '2023-06-17 22:09:44.790', '2023-06-17 22:19:09.983', '2023-06-17 22:15:19.572', '2023-06-17 22:21:14.828', '2023-06-17 22:21:41.157', '2023-06-17 22:23:06.036', '2023-06-17 22:27:19.070', '2023-06-17 22:27:03.854', '2023-06-17 22:28:53.296', '2023-06-17 22:33:52.264', '2023-06-17 22:34:57.518', '2023-06-17 22:47:40.202', '2023-06-17 22:47:58.102', '2023-06-17 22:47:58.550', '2023-06-17 22:49:41.546', '2023-06-17 22:49:46.063', '2023-06-17 22:50:08.812', '2023-06-17 22:52:23.236', '2023-06-17 22:56:37.736', '2023-06-17 22:59:06.817', '2023-06-17 23:04:01.269', '2023-06-17 23:07:13.322', '2023-06-17 22:56:48.799', '2023-06-17 22:39:13.389', '2023-06-17 23:12:28.034', '2023-06-17 23:14:32.697', '2023-06-17 23:20:10.653', '2023-06-17 23:23:01.957', '2023-06-17 23:23:32.825', '2023-06-17 23:30:58.160', '2023-06-17 23:22:14.980', '2023-06-17 23:31:14.529', '2023-06-17 23:31:40.861', '2023-06-17 23:36:09.446', '2023-06-17 23:47:19.173', '2023-06-17 23:28:27.466', '2023-06-17 23:49:43.044', '2023-06-17 23:28:35.754', '2023-06-17 23:55:38.878', '2023-06-17 23:56:38.590', '2023-06-17 23:44:06.839', '2023-06-17 15:38:59.886', '2023-06-17 11:42:29.640', '2023-06-17 11:32:01.165', '2023-06-17 16:06:55.354', '2023-06-17 07:44:01.391', '2023-06-17 15:01:03.617', '2023-06-17 17:06:05.529', '2023-06-17 19:06:21.295'],
'time_assigned' : ['2023-06-17 20:15:04.319', '2023-06-17 20:18:36.504', '2023-06-17 20:21:51.155', '2023-06-17 20:27:12.338', '2023-06-17 20:28:01.803', '2023-06-17 20:29:08.918', '2023-06-17 20:32:30.227', '2023-06-17 20:32:39.857', '2023-06-17 20:33:53.921', '2023-06-17 20:36:02.392', '2023-06-17 20:41:18.113', '2023-06-17 20:42:59.638', '2023-06-17 20:43:09.292', '2023-06-17 20:45:15.437', '2023-06-17 20:45:22.236', '2023-06-17 20:46:06.525', '2023-06-17 20:46:14.485', '2023-06-17 20:55:30.275', '2023-06-17 20:56:09.985', '2023-06-17 21:01:15.087', '2023-06-17 21:02:48.466', '2023-06-17 21:02:48.822', '2023-06-17 21:02:55.819', '2023-06-17 21:04:25.885', '2023-06-17 21:09:05.480', '2023-06-17 21:13:00.095', '2023-06-17 21:21:06.182', '2023-06-17 21:22:41.290', '2023-06-17 21:26:24.277', '2023-06-17 21:26:44.479', '2023-06-17 21:29:08.258', '2023-06-17 21:34:00.114', '2023-06-17 21:35:57.838', '2023-06-17 21:38:26.939', '2023-06-17 21:41:22.903', '2023-06-17 21:42:51.774', '2023-06-17 21:43:30.574', '2023-06-17 21:44:22.396', '2023-06-17 21:45:07.438', '2023-06-17 21:45:37.287', '2023-06-17 21:45:48.680', '2023-06-17 21:46:52.541', '2023-06-17 21:50:24.423', '2023-06-17 21:52:47.105', '2023-06-17 21:54:52.303', '2023-06-17 21:56:52.651', '2023-06-17 21:56:56.528', '2023-06-17 22:03:25.218', '2023-06-17 22:06:36.079', '2023-06-17 22:07:40.839', '2023-06-17 22:11:55.120', '2023-06-17 22:16:09.169', '2023-06-17 22:19:43.147', '2023-06-17 22:20:24.109', '2023-06-17 22:21:31.828', '2023-06-17 22:23:27.157', '2023-06-17 22:26:16.273', '2023-06-17 22:27:51.741', '2023-06-17 22:29:27.678', '2023-06-17 22:29:39.063', '2023-06-17 22:35:30.515', '2023-06-17 22:38:40.255', '2023-06-17 22:47:46.807', '2023-06-17 22:48:28.767', '2023-06-17 22:48:29.899', '2023-06-17 22:50:00.869', '2023-06-17 22:50:07.991', '2023-06-17 22:53:23.839', '2023-06-17 22:53:28.796', '2023-06-17 22:58:05.570', '2023-06-17 23:01:11.459', '2023-06-17 23:06:35.475', '2023-06-17 23:10:00.280', '2023-06-17 23:12:26.824', '2023-06-17 23:14:00.761', '2023-06-17 23:14:02.275', '2023-06-17 23:18:30.538', '2023-06-17 23:20:41.619', '2023-06-17 23:28:43.162', '2023-06-17 23:29:37.107', '2023-06-17 23:31:24.097', '2023-06-17 23:31:56.054', '2023-06-17 23:31:57.824', '2023-06-17 23:34:37.704', '2023-06-17 23:42:19.000', '2023-06-17 23:48:35.095', '2023-06-17 23:49:57.602', '2023-06-17 23:50:05.353', '2023-06-17 23:51:07.026', '2023-06-17 23:57:56.296', '2023-06-17 23:59:01.553', '2023-06-18 01:10:13.441', '2023-06-18 01:20:43.506', '2023-06-18 04:25:35.884', '2023-06-18 09:43:13.344', '2023-06-18 09:44:45.123', '2023-06-18 11:53:07.047', '2023-06-18 11:54:02.087', '2023-06-18 11:55:45.177', '2023-06-18 11:57:43.830'],
'time_closed' : ['2023-06-17 20:20:31.137', '2023-06-17 20:23:31.117', '2023-06-17 20:28:08.333', '2023-06-17 20:36:30.223', '2023-06-17 20:31:05.252', '2023-06-17 20:35:27.100', '2023-06-17 20:35:51.635', '2023-06-17 20:37:58.641', '2023-06-17 20:37:18.241', '2023-06-17 20:39:35.518', '2023-06-17 20:45:17.265', '2023-06-17 20:47:56.438', '2023-06-17 20:48:08.085', '2023-06-17 20:52:07.462', '2023-06-17 20:49:31.387', '2023-06-17 20:50:19.867', '2023-06-17 20:54:43.217', '2023-06-17 20:59:05.857', '2023-06-17 21:00:14.904', '2023-06-17 21:05:58.286', '2023-06-17 21:09:19.817', '2023-06-17 21:06:14.715', '2023-06-17 21:07:30.307', '2023-06-17 21:06:11.474', '2023-06-17 21:12:11.986', '2023-06-17 21:16:29.860', '2023-06-17 21:24:08.914', '2023-06-17 21:32:29.255', '2023-06-17 21:28:56.732', '2023-06-17 21:31:43.205', '2023-06-17 21:30:36.539', '2023-06-17 21:37:06.879', '2023-06-17 21:39:10.565', '2023-06-17 21:43:17.005', '2023-06-17 21:44:30.360', '2023-06-17 21:43:51.222', '2023-06-17 21:47:52.677', '2023-06-17 21:44:57.522', '2023-06-17 21:45:43.180', '2023-06-17 21:48:09.917', '2023-06-17 21:46:14.039', '2023-06-17 21:49:50.680', '2023-06-17 21:52:09.533', '2023-06-17 21:56:49.093', '2023-06-17 21:58:12.870', '2023-06-17 21:58:39.697', '2023-06-17 22:03:28.664', '2023-06-17 22:07:30.328', '2023-06-17 22:11:56.199', '2023-06-17 22:16:02.822', '2023-06-17 22:14:33.618', '2023-06-17 22:19:09.961', '2023-06-17 22:25:23.768', '2023-06-17 22:25:21.741', '2023-06-17 22:30:12.860', '2023-06-17 22:26:06.134', '2023-06-17 22:29:17.436', '2023-06-17 22:31:15.628', '2023-06-17 22:31:38.816', '2023-06-17 22:33:41.970', '2023-06-17 22:38:31.387', '2023-06-17 22:43:06.899', '2023-06-17 22:51:55.411', '2023-06-17 22:52:57.830', '2023-06-17 22:50:03.305', '2023-06-17 22:55:41.605', '2023-06-17 22:53:19.549', '2023-06-17 22:55:27.440', '2023-06-17 22:59:39.774', '2023-06-17 23:03:54.050', '2023-06-17 23:06:07.888', '2023-06-17 23:08:52.646', '2023-06-17 23:13:22.386', '2023-06-17 23:12:54.268', '2023-06-17 23:14:33.900', '2023-06-17 23:17:59.087', '2023-06-17 23:28:39.801', '2023-06-17 23:24:25.467', '2023-06-17 23:34:34.307', '2023-06-17 23:31:36.533', '2023-06-17 23:42:27.253', '2023-06-17 23:38:36.862', '2023-06-17 23:37:38.303', '2023-06-17 23:39:50.893', '2023-06-17 23:48:30.849', '2023-06-17 23:51:00.421', '2023-06-17 23:51:00.132', '2023-06-17 23:54:26.615', '2023-06-17 23:52:19.068', '2023-06-18 00:05:11.267', '2023-06-18 00:03:32.152', '2023-06-18 01:10:40.385', '2023-06-18 01:21:10.944', '2023-06-18 04:26:30.131', '2023-06-18 09:44:37.892', '2023-06-18 09:45:18.282', '2023-06-18 11:53:55.331', '2023-06-18 11:55:37.832', '2023-06-18 11:57:39.497', '2023-06-18 11:59:01.326'],
'countries' : ['country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country3, country4, country2, country5', 'country1, country3, country2, country4, country5', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country9, country1, country3, country2, country5, country13, country12', 'country1, country5', 'country5, country1, country3, country6, country2, country11, country10, country8', 'country1, country3, country4, country2, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country2, country4, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country4, country2, country5', 'country5, country1, country3, country6, country2, country11, country10, country8', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country1, country9, country2, country10, country3, country5', 'country9, country1, country3, country2, country5, country13, country12', 'country1, country9, country2, country10, country3, country5', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country9, country5, country2, country3, country1, country12, country4, country10, country6', 'country5, country1, country3, country6, country2, country11, country10, country8', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country9, country1, country3, country2, country5, country13, country12', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country4, country2, country5', 'country1, country9, country2, country10, country3, country5', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country1, country3, country4, country2, country5', 'country1, country5', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country1, country2, country3, country8', 'country1, country5', 'country1, country2, country3, country8', 'country1, country2, country3, country8', 'country9, country5, country2, country3, country1, country12, country4, country10, country6', 'country1, country2, country3, country8', 'country3, country9, country1, country12, country5, country2, country4, country10', 'country9, country5, country2, country3, country1, country12, country4, country10, country6', 'country1, country9, country2, country10, country3, country5', 'country9, country1, country3, country2, country5, country13, country12', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country2, country4, country5', 'country1, country5', 'country1, country3, country2, country4, country5', 'country1, country5', 'country1, country3, country4, country2, country5', 'country1, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country3, country4, country2, country5', 'country1, country3, country2, country4, country5', 'country1, country5', 'country1, country5', 'country1, country3, country4, country2, country5', 'country1, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country5', 'country1, country5', 'country1, country5', 'country1, country3, country2, country4, country5', 'country1, country9, country2, country10, country3, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country9, country2, country10, country3, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country4, country2, country5', 'country1, country5', 'country1, country3, country2, country4, country5', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country2, country3, country8', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country9, country2, country10, country3, country5', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country4, country2, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country1, country3, country4, country2, country5', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country3, country4, country2, country5', 'country1, country5', 'country1, country9, country2, country10, country3, country5', 'country1, country2, country3, country8', 'country1, country5', 'country6, country3, country1, country2, country5, country10, country8, country4, country7', 'country14', 'country4', 'country10, country2, country1, country12, country9, country13', 'country10, country2, country1, country12, country9, country13', 'country1, country14, country10, country3, country15, country6, country4, country2, country5, country11', 'country1, country14, country10, country3, country15, country6, country4, country2, country5, country11', 'country1, country14, country10, country3, country15, country6, country4, country2, country5, country11', 'country1, country14, country10, country3, country15, country6, country4, country2, country5, country11'],
'functions' : ['function_1, function_2', 'function_4, function_2, function_1', 'function_2, function_1, function_6, function_4', 'function_2, function_4, function_1, function_5', 'function_2, function_1, function_4, function_5', 'function_7', 'function_2, function_4, function_1', 'function_4, function_2, function_1', 'function_1, function_2', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_1, function_6, function_4', 'function_1, function_2', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_2, function_4, function_1', 'function_2, function_4, function_1, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_1, function_4, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_4, function_1, function_5', 'function_1, function_2', 'function_2, function_6, function_1, function_4', 'function_2, function_4, function_1', 'function_2, function_4, function_1, function_5', 'function_2, function_1, function_4, function_5', 'function_1, function_2', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_7', 'function_2, function_4, function_1, function_5', 'function_5', 'function_7', 'function_5', 'function_5', 'function_2, function_6, function_1, function_4', 'function_5', 'function_2, function_4, function_1, function_5', 'function_2, function_6, function_1, function_4', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_1, function_4, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_1, function_6, function_4', 'function_7', 'function_2, function_1, function_6, function_4', 'function_7', 'function_4, function_2, function_1', 'function_7', 'function_1, function_2', 'function_4, function_2, function_1', 'function_2, function_1, function_6, function_4', 'function_7', 'function_7', 'function_4, function_2, function_1', 'function_7', 'function_1, function_2', 'function_7', 'function_7', 'function_7', 'function_2, function_1, function_6, function_4', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_1, function_2', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_7', 'function_2, function_1, function_6, function_4', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_1, function_2', 'function_5', 'function_1, function_2', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_1, function_2', 'function_4, function_2, function_1', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_4, function_2, function_1', 'function_7', 'function_2, function_6, function_7, function_4, function_1, function_5', 'function_5', 'function_7', 'function_1, function_2', 'function_2, function_4', 'function_5', 'function_5, function_2, function_1, function_4, function_3', 'function_5, function_2, function_1, function_4, function_3', 'function_3', 'function_3', 'function_3', 'function_3']}

df1=pd.DataFrame(data)

df1['time_created'] = pd.to_datetime(df1['time_created'])
df1['time_assigned'] = pd.to_datetime(df1['time_assigned'])
df1['time_closed'] = pd.to_datetime(df1['time_closed'])

# Defining a function to fetch the relevant data
def get_next_task_assign_creation_time(row):

    # Limit time for the data to max 10 days from closing the last task
    ten_days=timedelta(hours=240)
    
    # Get the necessary parameters from the row
    task_id=row['task_id']
    login=row['employee_id']
    time_closed=row['time_closed']

    # set the max time limit for time_assigned
    upper_time_limit=time_closed+ten_days

    # create array of functions and stores
    func= [s.strip() for s in row['functions'].split(',')]
    countries= [s.strip() for s in row['countries'].split(',')]

    # progressively filter the df1
    
    t_df=df1[(df1['employee_id']==login)]  
    t_df=t_df[t_df['time_assigned']>=time_closed] 
    t_df=t_df[t_df['time_assigned']<=upper_time_limit]
    t_df=t_df[(t_df['function'].str.strip().isin(func)) ]  
    t_df=t_df[(t_df['country_code'].str.strip().isin(countries))]

    # finding the time_assigned value 
    next_task_time=t_df['time_assigned'].min()

    return next_task_time

# Update the next_task_pickup_time value
df1['next_task_pickup_time']=df1.apply(get_next_task_assign_creation_time,axis=1)

预期输出是包含字典的列,如下所示

'next_task_details=["{'request_id': 'request_1001', 'next_task_pickup_time': Timestamp('2023-06-17 20:33:53.921000'), 'next_task_creation_time': Timestamp('2023-06-17 20:30:57.405000')}", "{'request_id': 'request_1002', 'next_task_pickup_time': Timestamp('2023-06-17 20:32:39.857000'), 'next_task_creation_time': Timestamp('2023-06-17 20:28:40.599000')}", "{'request_id': 'request_1003', 'next_task_pickup_time': Timestamp('2023-06-17 20:42:59.638000'), 'next_task_creation_time': Timestamp('2023-06-17 20:42:00.869000')}", "{'request_id': 'request_1004', 'next_task_pickup_time': NaT, 'next_task_creation_time': NaT}", "{'request_id': 'request_1005', 'next_task_pickup_time': Timestamp('2023-06-17 21:01:15.087000'), 'next_task_creation_time': Timestamp('2023-06-17 21:00:49.596000')}", "{'request_id': 'request_1006', 'next_task_pickup_time': Timestamp('2023-06-17 20:36:02.392000'), 'next_task_creation_time': Timestamp('2023-06-17 20:30:38.544000')}", "{'request_id': 'request_1007', 'next_task_pickup_time': Timestamp('2023-06-17 20:46:14.485000'), 'next_task_creation_time': Timestamp('2023-06-17 20:46:07.558000')}", "{'request_id': 'request_1008', 'next_task_pickup_time': Timestamp('2023-06-17 20:46:06.525000'), 'next_task_creation_time': Timestamp('2023-06-17 20:45:46.476000')}", "{'request_id': 'request_1009', 'next_task_pickup_time': Timestamp('2023-06-17 20:43:09.292000'), 'next_task_creation_time': Timestamp('2023-06-17 20:39:44.346000')}", "{'request_id': 'request_1010', 'next_task_pickup_time': Timestamp('2023-06-17 20:45:15.437000'), 'next_task_creation_time': Timestamp('2023-06-17 20:43:40.614000')}", 

这种方法需要非常完整的时间(例如 100000 行需要 4 小时)。我相信行迭代非常慢,矢量化可以使它更快。我正在寻求帮助以了解是否可以应用矢量化而不是逐行应用函数。

任何人都可以分享他们对上述过程矢量化或任何其他可能的方法的想法/建议吗?

python pandas vectorization large-data
1个回答
0
投票

我建议的一件事是按员工分组,这是相当容易实现的目标。

从我的代码中可以看到,每个员工的结果都独立于其他员工的结果。您可以将它们分开并单独处理。

为什么这很重要?您为每一行调用一次以下代码:

t_df=df1[(df1['employee_id']==login)]  

这需要在整个数据框中搜索具有同一员工的行。这是一个 O(N^2) 操作,所以会很慢。但这也意味着细分问题会让速度更快。

假设您有 1,000 行,有 10 名员工。 O(N^2) 算法需要 1,000,000 步才能解决这个问题。但如果将其分成每个员工的一部分,则只有 (100^2 * 10) = 100,000 步。这快了十倍,但确切的加速取决于有多少员工。

def process_employee_df(employee_df):
    def get_next_task_assign_creation_time(row):

        # Limit time for the data to max 10 days from closing the last task
        ten_days=timedelta(hours=240)

        # Get the necessary parameters from the row
        task_id=row['task_id']
        login=row['employee_id']
        time_closed=row['time_closed']

        # set the max time limit for time_assigned
        upper_time_limit=time_closed+ten_days

        # create array of functions and stores
        func= [s.strip() for s in row['functions'].split(',')]
        countries= [s.strip() for s in row['countries'].split(',')]

        # progressively filter the df1

        t_df=employee_df
        t_df=t_df[t_df['time_assigned']>=time_closed] 
        t_df=t_df[t_df['time_assigned']<=upper_time_limit]
        t_df=t_df[(t_df['function'].str.strip().isin(func)) ]  
        t_df=t_df[(t_df['country_code'].str.strip().isin(countries))]

        # finding the time_assigned value 
        next_task_time=t_df['time_assigned'].min()
        return next_task_time
    return employee_df.apply(get_next_task_assign_creation_time, axis=1)

df1.groupby(['employee_id']).apply(process_employee_df).droplevel(0).sort_index()
© www.soinside.com 2019 - 2024. All rights reserved.