pyspark.sql.functions.col如何调用已删除的代码

问题描述 投票:0回答:1

在火花中,我尝试了下面的代码起作用

enter image description here

但是当我尝试下面的代码时,它不会唤醒

df= df.select('ID','SALE')
df= df.filter(df.COUNTRY=='US')

Traceback (most recent call last):
  File "/tmp/conda-63a80358-05c2-4a42-938b-914dfe53fcdc/real/envs/conda-env/lib/python2.7/site-packages/six.py", line 719, in exec_
    exec("""exec _code_ in _globs_, _locs_""")
  File "<string>", line 1, in <module>
  File "<console>", line 2, in <module>
  File "/tmp/conda-63a80358-05c2-4a42-938b-914dfe53fcdc/real/envs/conda-env/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 1295, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'COUNTRY'
pyspark pyspark-sql pyspark-dataframes
1个回答
0
投票

由于您的dfID,SALE变量中仅具有df列,并且您正在从不存在的COUNTRY中选择df列。

df= df.select('ID','SALE') # here df will have only 'ID','SALE'
df= df.filter(df.COUNTRY=='US') #here we are trying to select COUNTRY column which doesn't exists in df variable

Example:

df=spark.createDataFrame([(1,'india',250,),(2,'US',100,)],['id','country','sale'])
df.filter(col("country") == "US").show()
#or
df.filter(df.country == "US").show()
#+---+-------+----+
#| id|country|sale|
#+---+-------+----+
#|  2|     US| 100|
#+---+-------+----+
© www.soinside.com 2019 - 2024. All rights reserved.