PySpark过滤器：AttributeError：'numpy.float32'对象没有属性'_get_object_id'

Question

在PySpark中，我想像这样过滤一个spark数据帧

 temp_df = df1.filter(df1.latitude_float.between(lat_min, lat_max) & df1.longitude_float.between(lng_min, lng_max))

[df1是通过spark.sql构造的数据帧，这是printSchema]的结果>

 |-- vin_nbr: string (nullable = true)
 |-- timstm_hm: string (nullable = true)
 |-- latitude: string (nullable = true)
 |-- longitude: string (nullable = true)
 |-- make: string (nullable = true)
 |-- model: string (nullable = true)
 |-- timstm_hm_timestamp: timestamp (nullable = true)
 |-- latitude_float: float (nullable = true)
 |-- longitude_float: float (nullable = true)
如您所见，latitude_float和longitude_float实际上是浮动的。 lat_min，lat_max，lng_min和lng_max也是浮动的。为什么会出现此错误？

AttributeError: 'numpy.float32' object has no attribute '_get_object_id'

<< >

Answer 1

似乎您正在传递numpy类型来激发。打印type(lat_min)，它输出什么？它必须是纯python类型<class 'float'>

要将numpy类型转换为python，请调用其.item()方法

PySpark过滤器：AttributeError：'numpy.float32'对象没有属性'_get_object_id'

问题描述投票：0回答：1

1个回答

最新问题

PySpark过滤器：AttributeError：'numpy.float32'对象没有属性'_get_object_id'

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1