我如何以编程方式从Spark sql中的日期中减去几天?
val date = "2019-10-01"
val tmp = spark.sql("""
select id,
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between ('$date'-6) and '$date'
group by 1
""")
尝试:
val tmp = spark.sql("""
select id,
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between CAST(CAST('$date' as DATE) - interval '6' day) as varchar) and '$date'
group by 1
""")
收到错误:parser exception at varchar
也尝试过
val tmp = spark.sql("""
select id,
count(distinct purchase_id) as count_purchases
from
my_table
where partition_date between sub_date('$date',6) and
'$date'
group by 1
""")
看看this答案。
使用expr函数(如果您具有从列到减去):
从pyspark.sql.functions导入*df.withColumn('substracted_dates',expr(“ date_sub(date_col,days_col)”))使用withColumn函数(如果要减去文字值):
df.withColumn('substracted_dates',date_sub('date_col',))
您可以使用纯Scala实现:
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
val someDateStr = "2020-01-20 00:00:00"
val endDate = LocalDateTime.parse(someDateStr, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))
val startDate = endDate.minusDays(6)
startDate
这将打印startDate: java.time.LocalDateTime = 2020-01-14T00:00
,然后您可以在查询中使用startDate
和endDate
进行字符串插值:
val tmp = s"""select id, count(distinct purchase_id) as count_purchases
from my_table
where partition_date between CAST('{$startDate.toString}' AS DATE) and CAST('{$endDate.toString}' AS DATE)
group by 1"""
或者,您可以将内置function date_sub
与CAST
一起使用:
val endDate = "2020-01-20"
val query = s"""select id, count(distinct purchase_id) as count_purchases
from my_table
where partition_date between date_sub(CAST('$endDate' AS DATE), 6) and CAST('$endDate' AS DATE)
group by 1"""
spark.sql(query)