TypeError：type Column没有定义round方法[duplicate]

Question

这个问题在这里已有答案：

How to set display precision in PySpark Dataframe show 1回答

我的数据看起来像这样：

+-------+-------+------+----------+
|book_id|user_id|rating|prediction|
+-------+-------+------+----------+
|    148|    588|     4|  3.953999|
|    148|  28767|     3| 2.5816362|
|    148|  41282|     3|  4.185532|
|    148|  18313|     4| 3.6297297|
|    148|  11272|     3| 3.0962112|
+-------+-------+------+----------+

我想通过舍入预测列中的值来创建新的列名'pred_class'。我运行这段代码：

results.withColumn('pred_class',round(results['prediction']))

它给了我这样的错误：

TypeError：type Column不定义round方法

有人可以帮我这个吗？谢谢！

Answer 1

您正在使用基本python中的round函数对未正确定义的spark Column对象。使用round的pyspark.sql.functions函数代替：

results = spark.createDataFrame([{'book_id': 148, 'user_id': 588, 'rating': 4, 'prediction': 3.953999}])

from pyspark.sql.functions import round   # import the method here
results.withColumn('pred_class',round(results['prediction'])).show()

+-------+----------+------+-------+----------+
|book_id|prediction|rating|user_id|pred_class|
+-------+----------+------+-------+----------+
|    148|  3.953999|     4|    588|       4.0|
+-------+----------+------+-------+----------+

TypeError：type Column没有定义round方法[duplicate]

问题描述投票：0回答：1

1个回答

最新问题

TypeError：type Column没有定义__round__方法[duplicate]

问题描述 投票：0回答：1

1个回答

最新问题

TypeError：type Column没有定义round方法[duplicate]

问题描述投票：0回答：1