Spyder 的“?”什么时候出现?后缀产量文档字符串?

问题描述 投票:0回答:1

我使用 Anaconda 在 Windows 10 上安装 Python 3.9 和 PySpark。我正在关注 this 教程 操纵

DataFrame
Column
。我已经大幅削减了 得出以下最小工作示例。它由 4 部分: (1) DataFrame 的数据; (2) DataFrame 的架构; (3)开始 Spark并创建DataFrame; (4) 创建新模式以重命名“name” 子字段。

# Data for DataFrame
dataDF = [(('James','Smith'),'1991-04-01'),
  (('Michael',''),'2000-05-19'),
]

# Schema definition needed to create DataFrame. The first "name"
# field consists of 2 subfields for first and last name
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
schema = StructType([
        StructField('name', StructType([
             StructField('firstname', StringType(), True),
             StructField('lastname', StringType(), True)
             ])),
         StructField('dob', StringType(), True),
         ])

# Start a Spark session and create the DataFrame "df"
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
df = spark.createDataFrame(data = dataDF, schema = schema)

# Define a new schema for renaming the "name" subfields
schema2 = StructType([
    StructField("fname",StringType()),
    StructField("lname",StringType())])

from pyspark.sql.functions import col
df.select( col("name").cast(schema2), col("dob") ).printSchema()

我尝试查找

cast
做了什么。看起来不像Python
cast
因为它是
col("name")
的方法,属于
Column
类。 我不成功尝试通过添加来提取文档字符串 Spyder 的
?
后缀为
cast
,通过 Column 对象访问

df.name.cast?

下面的附件中列出了大量错误。

既然(比如说)

df.select?
显示了所需的文档字符串,为什么 这对
df.name.cast?
不起作用吗?更具体地说,什么时候可以
?
预计会起作用吗?

我注意到

df.name
df.name.cast
是已识别的对象

In [98]: df.name
Out[98]: Column<'name'>

In [99]: df.name.cast
Out[99]: <bound method Column.cast of Column<'name'>>

附件:来自
df.name.cast?

的错误
Traceback (most recent call last):

  Cell In[97], line 1
    get_ipython().run_line_magic('pinfo', 'df.name.cast')

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\interactiveshell.py:2414 in run_line_magic
    result = fn(*args, **kwargs)

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\magics\namespace.py:58 in pinfo
    self.shell._inspect('pinfo', oname, detail_level=detail_level,

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\interactiveshell.py:1795 in _inspect
    pmethod(

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\oinspect.py:782 in pinfo
    info_b: Bundle = self._get_info(

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\oinspect.py:738 in _get_info
    info_dict = self.info(obj, oname=oname, info=info, detail_level=detail_level)

  File ~\anaconda3\envs\py39\lib\site-packages\IPython\core\oinspect.py:838 in info
    if info and info.parent and hasattr(info.parent, HOOK_NAME):

  File ~\anaconda3\envs\py39\lib\site-packages\pyspark\sql\column.py:1369 in __nonzero__
    raise ValueError(

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
pyspark spyder
1个回答
0
投票
来自

?

 对象的 
cast
对我有用:
column

© www.soinside.com 2019 - 2024. All rights reserved.