PySpark将名称分配给列值'withcolumn'

问题描述 投票:0回答:1

我是PySaprk的新手,但是有一些R的经验。

问题:我想为一个列中列出的高度(数字)分配一个名称。我开始编写如下代码:

w = Window.partitionBy("student_id")
df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when(lower(col("height")) > 4 < 5, "ok_ht")  
                      .when(lower(col("height")) >=5 < 6, "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht")) 

但是出现以下错误:

    633 
    634     def __nonzero__(self):
--> 635         raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
    636                          "'~' for 'not' when building DataFrame boolean expressions.")
    637     __bool__ = __nonzero__

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

谢谢您的帮助K

python pyspark pyspark-sql pyspark-dataframes
1个回答
0
投票

您应该将条件分解成这样的单独表达式:

df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when((lower(col("height")) > 4) & (lower(col("height")) < 5), "ok_ht")  
                      .when((lower(col("height")) >=5) & (lower(col("height")) < 6), "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht"))
© www.soinside.com 2019 - 2024. All rights reserved.