将选择性列插入到 pyspark 数据框中

Question

我有一个具有以下架构的数据框：

col1 作为 col1，第 2 栏，第 3 栏，。。列 n

我有另一个具有以下架构的数据框

col1 作为 diffName，第 n+1 列，第 n+2 列。。 .

如何将数据帧的第 1 列中的值以及其余列中的值附加为空值？

我尝试过使用联合和合并，但没有成功

Answer 1

也许您正在寻找

unionByName

函数，但指定了

allowMissingColumns=True

关键字。


from pyspark.sql import Row
from pyspark.sql.types import StructField, StructType, StringType, IntegerType

df1 = spark.createDataFrame(
    data = [Row(name="John", age=27), Row(name="Tony", age=27)],
    schema=StructType([
        StructField(name="name", dataType=StringType(), nullable=True),
        StructField(name="age", dataType=IntegerType(), nullable=True),
]))

df2 = spark.createDataFrame(
    data = [Row(name="John", zip=1234), Row(name="Tony", zip=1235)],
    schema=StructType([
        StructField(name="name", dataType=StringType(), nullable=True),
        StructField(name="zip", dataType=IntegerType(), nullable=True)
]))

df3 = df1.unionByName(df2, allowMissingColumns=True)
df3.display()

将选择性列插入到 pyspark 数据框中

问题描述投票：0回答：1

1个回答

最新问题

将选择性列插入到 pyspark 数据框中

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1