PySpark DataFrame:有条件地替换列值

问题描述 投票:0回答:2

我有一个包含两列的 PySpark Dataframe:

id 地址类型
100 1
101 1
102 2
103 2

我想更改

address_type
列中的所有值。当
address_type = 1
时,应该是
Mailing address
,如果是
address_type = 2
,则应该是
Physical address

想要的结果

id 地址类型
100 邮寄地址
101 邮寄地址
102 实际地址
103 实际地址

实现这一目标的最佳方法是什么?

apache-spark pyspark apache-spark-sql
2个回答
0
投票
df  = spark.createDataFrame(
      [
        ('100','1'),
        ('101','1'),
        ('102','2'),
        ('103','2')
      ], ['id','address_type']
    )

from pyspark.sql import functions as F

df.withColumn('address_type', 
              F.when(df.address_type == 1, F.lit('Mailing address'))
               .when(df.address_type == 2, F.lit('Physical address'))
               .otherwise(df.address_type))\
              .show()
+---+----------------+
| id|    address_type|
+---+----------------+
|100| Mailing address|
|101| Mailing address|
|102|Physical address|
|103|Physical address|
+---+----------------+

0
投票

df.replace(['1','2'],['邮寄地址','实际地址'],'address_type').show()

© www.soinside.com 2019 - 2024. All rights reserved.