使用apache-spark,这些变量具有一种奇怪的格式,称为dttm
,显示如下:
tpep_pickup_datetime tpep_dropoff_datetime
<dttm> <dttm>
2015-01-15 18:05:39 2015-01-15 18:23:42
2015-01-10 19:33:38 2015-01-10 19:53:28
2015-01-10 19:33:38 2015-01-10 19:43:41
2015-01-10 19:33:39 2015-01-10 19:35:31
我想以秒为单位计算tpep_pickup_datetime
和tpep_dropoff_datetime
之间的时间差。
但是使用lubridate
包无效。如何将这些变量转换为POSIXCT
格式使用dplyr
?
当我使用以下代码时:
my_df %>% mutate(diff_time = difftime(tpep_dropoff_datetime,tpep_pickup_datetime,units = "secs"))
我收到此错误:
org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'AS' expecting {')', ','}(line 1, pos 121)
all_data <- all_data %>%
mutate(new_pickup = as.POSIXct(tpep_pickup_datetime)) %>%
mutate(day_pickup = as.Date(new_pickup)) %>%
mutate(time_pickup = paste(hour(new_pickup), minute(new_pickup),second(new_pickup),sep="-")) %>%
mutate(new_dropoff = as.POSIXct(tpep_dropoff_datetime)) %>%
mutate(day_dropoff = as.Date(new_dropoff)) %>%
mutate(time_dropoff = paste(hour(new_dropoff), minute(new_dropoff),second(new_dropoff),sep="-")) %>%
mutate(trip_duration = ((hour(new_dropoff) - hour(new_pickup))*3600 + (minute(new_dropoff) - minute(new_pickup))*60 + (second(new_dropoff) - second(new_pickup))))