PySpark,在数据帧中创建没有“类别”的数据帧的折线图

问题描述 投票:0回答:1

我在databricks上运行以下代码:

dataToShow = jDataJoined.\
withColumn('id', monotonically_increasing_id()).\
filter( 
  (jDataJoined.containerNumber == 'SUDU8108536')).\
select(col('id'), col('returnTemperature'), col('supplyTemperature'))

这将给我表格数据

tabular data

现在我想显示一个折线图,其中返回温度和供应温度作为类别。

据我所知,数据库中的方法display想要作为第二个参数的类别,所以基本上我应该拥有的是类似的东西

id - temperatureCategory - value
1 - returnTemperature - 25.0
1 - supplyTemperature - 27.0
2 - returnTemperature - 24.0
2 - supplyTemperature - 28.0

如何以这种方式转换数据帧?

pyspark databricks
1个回答
1
投票

我不知道你的格式是否是显示方法所期望的,但你可以使用sql函数create_mapexplode进行这种转换:

#creates a example df
from pyspark.sql import functions as F
l1 = [(1,25.0,27.0),(2,24.0,28.0)]
df = spark.createDataFrame(l1,['id','returnTemperature','supplyTemperature'])

#creates a map column which contains the values of the returnTemperature and supplyTemperature
df = df.withColumn('mapCol', F.create_map(
                                    F.lit('returnTemperature'),df.returnTemperature
                                    ,F.lit('supplyTemperature'),df.supplyTemperature
                                   ) 
                  )
#The explode function creates a new row for each element of the map
df = df.select('id',F.explode(df.mapCol).alias('temperatureCategory','value'))
df.show()

输出:

+---+-------------------+-----+ 
| id|temperatureCategory|value| 
+---+-------------------+-----+ 
| 1 |  returnTemperature| 25.0| 
| 1 |  supplyTemperature| 27.0| 
| 2 |  returnTemperature| 24.0| 
| 2 |  supplyTemperature| 28.0| 
+---+-------------------+-----+
© www.soinside.com 2019 - 2024. All rights reserved.