我有两个时间戳列('tpep_pickup_datetime' 和 'tpep_dropoff_datetime'),当我计算它们之间的差异时,我得到一个区间变量。
yellowcab = yellowcab \
.withColumn('tpep_pickup_datetime', to_timestamp('tpep_pickup_datetime','yyyy-MM-dd HH:mm:ss'))\
.withColumn('tpep_dropoff_datetime', to_timestamp('tpep_dropoff_datetime','yyyy-MM-dd HH:mm:ss'))
yellowcab = yellowcab \
.withColumn('total_time', col('tpep_dropoff_datetime')-col('tpep_pickup_datetime'))
结果是这样的:
我想将“total_time”列转换为“int”变量,并将时间转换为秒。
我试图从间隔变量中提取小时和分钟,然后将它们相乘以转换为秒,但我没能做到
将区间转换为 int
.
data = [['2020-08-01 00:02:53', '2020-08-01 00:28:54']]
df = spark.createDataFrame(data, ['t1', 't2']) \
.withColumn('t1', f.to_timestamp('t1','yyyy-MM-dd HH:mm:ss')) \
.withColumn('t2', f.to_timestamp('t2','yyyy-MM-dd HH:mm:ss')) \
.withColumn('interval', (f.col('t2') - f.col('t1')).cast('int')) \
.show()
+-------------------+-------------------+--------+
| t1| t2|interval|
+-------------------+-------------------+--------+
|2020-08-01 00:02:53|2020-08-01 00:28:54| 1561|
+-------------------+-------------------+--------+