包含 pyspark SQL：TypeError：“Column”对象不可调用

Question

我正在使用 Spark 2.0.1，

 df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
|     1.0|   1.0|0.0|  1.0|  0.0|71.3|
|     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
|     1.0|   1.0|0.0|  1.0|  0.0|53.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
|     0.0|   1.0|1.0|  0.0|  0.0|51.9|

我有一个数据框，我想使用 withColumn 添加一个新列到 df ，新列的值基于其他列值。我用过这样的东西：

>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))

出现错误

TypeError: 'Column' object is not callable

可以帮助解决这个错误吗？

Answer 1

这是因为您正在尝试将函数

contains

应用于该列。 pyspark 中不存在函数

contains

。你应该尝试

like

。试试这个：

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

或者如果您只是希望它恰好是您应该做的数字

：

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))

Answer 2

要获得

contains

的等效项，您需要

like

函数，包括在搜索中使用的字符串之前和之后的

：

dfnew = df.withColumn('AddCol' , when(df.Pclass.like('%3.0%'),'three').otherwise('notthree'))

正如评论中所写，在较新版本的pyspark中可以使用

contains

函数。

包含 pyspark SQL：TypeError：“Column”对象不可调用

问题描述投票：0回答：2

2个回答

最新问题

包含 pyspark SQL：TypeError：“Column”对象不可调用

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2