在pyspark中连接字符串列时获取null。为什么?

问题描述 投票:-1回答:1

我有这个超级简单的数据框:

rc1.show(5)
rc1.printSchema()
+--------+-----------+
|      ID|Case number|
+--------+-----------+
|11034701|   JA366925|
|11227287|   JB147188|
|11227583|   JB147595|
|11227293|   JB147230|
|11227634|   JB147599|
+--------+-----------+
only showing top 5 rows

root
 |-- ID: string (nullable = true)
 |-- Case number: string (nullable = true)

我想添加一个新列,它只是“ Case number”列和“ aaa”的串联,所以我正在使用它来做到这一点:

rc2 = rc1.withColumn("Case numberxx", col("Case number") + "aaa")
rc2.show(5)

但是,对于我的一生,我无法理解为什么我的新列中充满了空值:

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|         null|
|11227287|   JB147188|         null|
|11227583|   JB147595|         null|
|11227293|   JB147230|         null|
|11227634|   JB147599|         null|
+--------+-----------+-------------+
only showing top 5 rows

为什么会这样?谢谢!

python string dataframe apache-spark pyspark
1个回答
-1
投票

好的,这很好:

from pyspark.sql.functions import concat, lit

rc2 = rc1.withColumn("Case numberxx", concat(col("Case number"), lit("aaa")))
rc2.show(5)

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|  JA366925aaa|
|11227287|   JB147188|  JB147188aaa|
|11227583|   JB147595|  JB147595aaa|
|11227293|   JB147230|  JB147230aaa|
|11227634|   JB147599|  JB147599aaa|
+--------+-----------+-------------+

但是,我不太清楚为什么它为null:

col("Case number") + lit("aaa")

但是没关系

concat(col("Case number"), lit("aaa"))

0
投票

好的,这很好:

from pyspark.sql.functions import concat, lit

rc2 = rc1.withColumn("Case numberxx", concat(col("Case number"), lit("aaa")))
rc2.show(5)

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|  JA366925aaa|
|11227287|   JB147188|  JB147188aaa|
|11227583|   JB147595|  JB147595aaa|
|11227293|   JB147230|  JB147230aaa|
|11227634|   JB147599|  JB147599aaa|
+--------+-----------+-------------+

但是,我不太清楚为什么它为null:

col("Case number") + lit("aaa")

但是没关系

concat(col("Case number"), lit("aaa"))
© www.soinside.com 2019 - 2024. All rights reserved.