Spark 周一开始一周

问题描述 投票:0回答:2

这是我的数据集:

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
    .select(
        F.col('date').cast('date'),
        F.date_format('date', 'EEEE').alias('weekday'),
        F.dayofweek('date').alias('weekday_number')
    )
df.show()
#+----------+-------+--------------+
#|      date|weekday|weekday_number|
#+----------+-------+--------------+
#|2021-02-07| Sunday|             1|
#|2021-02-08| Monday|             2|
#+----------+-------+--------------+

dayofweek
返回从星期日开始的工作日数字。

想要的结果:

+----------+-------+--------------+
|      date|weekday|weekday_number|
+----------+-------+--------------+
|2021-02-07| Sunday|             7|
|2021-02-08| Monday|             1|
+----------+-------+--------------+
apache-spark pyspark apache-spark-sql dayofweek spark3
2个回答
2
投票

你可以试试这个:

date_format(col("date"), "u")).alias('weekday_number')

由于某种原因,它不在 Spark 的格式化日期时间模式文档中

您可能还需要添加此配置行:

spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')

感谢您的反馈,很高兴为您提供帮助=)


0
投票
F.expr('weekday(date) + 1')

weekday

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
    .select(
        F.col('date').cast('date'),
        F.date_format('date', 'EEEE').alias('weekday'),
        F.expr('weekday(date) + 1').alias('weekday_number'),
    )
df.show()
#+----------+-------+--------------+
#|      date|weekday|weekday_number|
#+----------+-------+--------------+
#|2021-02-07| Sunday|             7|
#|2021-02-08| Monday|             1|
#+----------+-------+--------------+
© www.soinside.com 2019 - 2024. All rights reserved.