无法在pyspark中将带有标题的表写入s3路径?

问题描述 投票:0回答:1

我有以下数据集:

+-------------------+-------+------------+                                      
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我使用以下脚本将上表写入s3路径:

df.write.option("header","true").mode("overwrite").csv("<s3: path>")

但是,当尝试读取表格以进行进一步操作时,表格如下所示:

+-------------------+-------+------------+                                      
|                _c0|    _c1|         _c2|
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我希望桌子如何:

+-------------------+-------+------------+                                      
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我尝试用parquet格式编写文件然后它工作,但我想只用.csv格式写文件。任何形式的帮助或暗示都将受到高度赞赏。

pyspark pyspark-sql
1个回答
0
投票

这应该这样做,

sqlContext.read.csv("s3:///file_path", header = True)
© www.soinside.com 2019 - 2024. All rights reserved.