区间型变量如何操作?

问题描述 投票:0回答:1

我有两个时间戳列('tpep_pickup_datetime' 和 'tpep_dropoff_datetime'),当我计算它们之间的差异时,我得到一个区间变量。

yellowcab = yellowcab \
.withColumn('tpep_pickup_datetime', to_timestamp('tpep_pickup_datetime','yyyy-MM-dd HH:mm:ss'))\
.withColumn('tpep_dropoff_datetime', to_timestamp('tpep_dropoff_datetime','yyyy-MM-dd HH:mm:ss'))
yellowcab = yellowcab \
.withColumn('total_time', col('tpep_dropoff_datetime')-col('tpep_pickup_datetime'))

结果是这样的:

enter image description here

我想将“total_time”列转换为“int”变量,并将时间转换为秒。

我试图从间隔变量中提取小时和分钟,然后将它们相乘以转换为秒,但我没能做到

pyspark timestamp intervals
1个回答
0
投票

将区间转换为 int.


data = [['2020-08-01 00:02:53', '2020-08-01 00:28:54']]

df = spark.createDataFrame(data, ['t1', 't2']) \
  .withColumn('t1', f.to_timestamp('t1','yyyy-MM-dd HH:mm:ss')) \
  .withColumn('t2', f.to_timestamp('t2','yyyy-MM-dd HH:mm:ss')) \
  .withColumn('interval', (f.col('t2') - f.col('t1')).cast('int')) \
  .show()

+-------------------+-------------------+--------+
|                 t1|                 t2|interval|
+-------------------+-------------------+--------+
|2020-08-01 00:02:53|2020-08-01 00:28:54|    1561|
+-------------------+-------------------+--------+
© www.soinside.com 2019 - 2024. All rights reserved.