在火花中,我尝试了下面的代码起作用
但是当我尝试下面的代码时,它不会唤醒
df= df.select('ID','SALE')
df= df.filter(df.COUNTRY=='US')
Traceback (most recent call last):
File "/tmp/conda-63a80358-05c2-4a42-938b-914dfe53fcdc/real/envs/conda-env/lib/python2.7/site-packages/six.py", line 719, in exec_
exec("""exec _code_ in _globs_, _locs_""")
File "<string>", line 1, in <module>
File "<console>", line 2, in <module>
File "/tmp/conda-63a80358-05c2-4a42-938b-914dfe53fcdc/real/envs/conda-env/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 1295, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'COUNTRY'
由于您的df
在ID,SALE
变量中仅具有df
列,并且您正在从不存在的COUNTRY
中选择df
列。
df= df.select('ID','SALE') # here df will have only 'ID','SALE'
df= df.filter(df.COUNTRY=='US') #here we are trying to select COUNTRY column which doesn't exists in df variable
Example:
df=spark.createDataFrame([(1,'india',250,),(2,'US',100,)],['id','country','sale'])
df.filter(col("country") == "US").show()
#or
df.filter(df.country == "US").show()
#+---+-------+----+
#| id|country|sale|
#+---+-------+----+
#| 2| US| 100|
#+---+-------+----+