Spark SQL:从Spark sql中的日期减去天数

问题描述 投票:-1回答:2

我如何以编程方式从Spark sql中的日期中减去几天?

val date = "2019-10-01"

val tmp = spark.sql(""" 
select id, 
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between ('$date'-6) and '$date'
group by 1
""")

尝试:

val tmp = spark.sql(""" 
select id, 
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between CAST(CAST('$date' as DATE) - interval '6' day) as varchar) and '$date'
group by 1
""")

收到错误:parser exception at varchar

也尝试过

val tmp = spark.sql(""" 
select id, 
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between sub_date('$date',6) and 
'$date'
group by 1
""")
apache-spark pyspark apache-spark-sql pyspark-sql
2个回答
0
投票

看看this答案。

使用expr函数(如果您具有从列到减去):

从pyspark.sql.functions导入*df.withColumn('substracted_dates',expr(“ date_sub(date_col,days_col)”))使用withColumn函数(如果要减去文字值):

df.withColumn('substracted_dates',date_sub('date_col',))


0
投票

您可以使用纯Scala实现:

import java.time.LocalDateTime
import java.time.format.DateTimeFormatter

val someDateStr = "2020-01-20 00:00:00"
val endDate = LocalDateTime.parse(someDateStr, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))
val startDate = endDate.minusDays(6)

startDate

这将打印startDate: java.time.LocalDateTime = 2020-01-14T00:00,然后您可以在查询中使用startDateendDate进行字符串插值:

val tmp = s"""select id, count(distinct purchase_id) as count_purchases 
          from my_table 
          where partition_date between CAST('{$startDate.toString}' AS DATE) and CAST('{$endDate.toString}' AS DATE) 
          group by 1"""

或者,您可以将内置function date_subCAST一起使用:

val endDate = "2020-01-20"

val query = s"""select id, count(distinct purchase_id) as count_purchases 
          from my_table 
          where partition_date between date_sub(CAST('$endDate' AS DATE), 6) and CAST('$endDate' AS DATE)
          group by 1"""

spark.sql(query)
© www.soinside.com 2019 - 2024. All rights reserved.