我是PySaprk的新手,但是有一些R的经验。
问题:我想为一个列中列出的高度(数字)分配一个名称。我开始编写如下代码:
w = Window.partitionBy("student_id")
df_enc_hw = df_enc_hw.withColumn("stuname", \
when(lower(col("height")) <= 4, "under_ht")
.when(lower(col("height")) > 4 < 5, "ok_ht")
.when(lower(col("height")) >=5 < 6, "normal_ht")
.when(lower(col("height")) >=6, "abnor_ht"))
但是出现以下错误:
633
634 def __nonzero__(self):
--> 635 raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
636 "'~' for 'not' when building DataFrame boolean expressions.")
637 __bool__ = __nonzero__
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
谢谢您的帮助K
您应该将条件分解成这样的单独表达式:
df_enc_hw = df_enc_hw.withColumn("stuname", \
when(lower(col("height")) <= 4, "under_ht")
.when((lower(col("height")) > 4) & (lower(col("height")) < 5), "ok_ht")
.when((lower(col("height")) >=5) & (lower(col("height")) < 6), "normal_ht")
.when(lower(col("height")) >=6, "abnor_ht"))