我正在使用Pyspark 2.4,并希望从df_2
创建df_1
:
[df_1]
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
| | |-- postcode: string (nullable = false)
[df_2]
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
我知道udf是一种方法,但是还有其他方法,例如使用map()来实现相同的目标吗?
使用transform
功能:
transform
对于df_2 = df_1.withColumn("request", expr("transform(request, x -> struct(x.street) as address)"))
数组的每个元素,我们仅选择request
字段并创建一个新结构。