在pyspark中连接字符串列时获取null。为什么？

Question

我有这个超级简单的数据框：

rc1.show(5)
rc1.printSchema()
+--------+-----------+
|      ID|Case number|
+--------+-----------+
|11034701|   JA366925|
|11227287|   JB147188|
|11227583|   JB147595|
|11227293|   JB147230|
|11227634|   JB147599|
+--------+-----------+
only showing top 5 rows

root
 |-- ID: string (nullable = true)
 |-- Case number: string (nullable = true)

我想添加一个新列，它只是“ Case number”列和“ aaa”的串联，所以我正在使用它来做到这一点：

rc2 = rc1.withColumn("Case numberxx", col("Case number") + "aaa")
rc2.show(5)

但是，对于我的一生，我无法理解为什么我的新列中充满了空值：

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|         null|
|11227287|   JB147188|         null|
|11227583|   JB147595|         null|
|11227293|   JB147230|         null|
|11227634|   JB147599|         null|
+--------+-----------+-------------+
only showing top 5 rows

为什么会这样？谢谢！

Answer 1

好的，这很好：

from pyspark.sql.functions import concat, lit

rc2 = rc1.withColumn("Case numberxx", concat(col("Case number"), lit("aaa")))
rc2.show(5)

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|  JA366925aaa|
|11227287|   JB147188|  JB147188aaa|
|11227583|   JB147595|  JB147595aaa|
|11227293|   JB147230|  JB147230aaa|
|11227634|   JB147599|  JB147599aaa|
+--------+-----------+-------------+

但是，我不太清楚为什么它为null：

col("Case number") + lit("aaa")

但是没关系

concat(col("Case number"), lit("aaa"))

Answer 2

好的，这很好：

from pyspark.sql.functions import concat, lit

rc2 = rc1.withColumn("Case numberxx", concat(col("Case number"), lit("aaa")))
rc2.show(5)

+--------+-----------+-------------+
|      ID|Case number|Case numberxx|
+--------+-----------+-------------+
|11034701|   JA366925|  JA366925aaa|
|11227287|   JB147188|  JB147188aaa|
|11227583|   JB147595|  JB147595aaa|
|11227293|   JB147230|  JB147230aaa|
|11227634|   JB147599|  JB147599aaa|
+--------+-----------+-------------+

但是，我不太清楚为什么它为null：

col("Case number") + lit("aaa")

但是没关系

concat(col("Case number"), lit("aaa"))

在pyspark中连接字符串列时获取null。为什么？

问题描述投票：-1回答：1

1个回答

最新问题

在pyspark中连接字符串列时获取null。为什么？

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1