我有以下数据集:
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我使用以下脚本将上表写入s3
路径:
df.write.option("header","true").mode("overwrite").csv("<s3: path>")
但是,当尝试读取表格以进行进一步操作时,表格如下所示:
+-------------------+-------+------------+
| _c0| _c1| _c2|
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我希望桌子如何:
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我尝试用parquet
格式编写文件然后它工作,但我想只用.csv
格式写文件。任何形式的帮助或暗示都将受到高度赞赏。
这应该这样做,
sqlContext.read.csv("s3:///file_path", header = True)