是否可以在spark结构化流中更改_spark_metadata文件夹的位置?

问题描述 投票:0回答:1
val query = df.withColumn("value", col("value").cast(StringType))
      .withColumn("value", from_json(col("value"), processor.Schema))
      .select(unix_timestamp(col("timestamp")).alias("kafka_time"), col("value.*"))
      .filter(processor.filter)
      .transform(processor.transform)
      .writeStream
      .format("parquet")
      .partitionBy("grass_date")
      .option("path", config.savePath)
      .option("checkpointLocation", config.checkpointLocation)
      .trigger(Trigger.ProcessingTime("15 minutes"))
      .outputMode(OutputMode.Append)
      .start()

[使用parquet文件接收器运行结构化流作业时,spark在作业的写入路径下创建一个_spark_metadata文件夹。由于存在此文件夹,因此分区发现似乎无法正常工作。因此,是否有可能摆脱此_spark_metadata文件夹或更改其位置?

编辑1:我正在使用spark 2.4.4

编辑2:我可以在config.savePath上创建一个配置单元表。但是在该表中看不到任何数据。这是我在savePath下拥有的。

[xxx]$ hadoop fs -ls /tmp/ravi.mondal/product_click/remind_me_button
Found 2 items
drwxrwxr-x   - ravi.mondal supergroup          0 2020-05-20 12:36 /tmp/ravi.mondal/product_click/remind_me_button/_spark_metadata
drwxrwxr-x   - ravi.mondal supergroup          0 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20
[xxx]$
[xxx]$ hadoop fs -ls /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20
Found 27 items
-rw-rw-r--   3 ravi.mondal supergroup       1575 2020-05-20 12:46 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00009-34ec06fb-4506-4e73-963b-4441bd00410d.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1798 2020-05-20 12:31 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00017-e0d550b4-225c-44d5-a539-1e4e38a1069e.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1681 2020-05-20 11:46 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00023-9caf4a09-6c99-482b-9212-f03513c80070.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1561 2020-05-20 12:32 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00028-493b6d84-9638-4428-a0c7-99252d2efcd5.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1737 2020-05-20 12:32 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00032-4a72a3f3-a221-4071-b4f5-a49d16aadbba.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1773 2020-05-20 12:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00036-dca34760-861f-45f8-8ce0-51feb5ac2768.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1539 2020-05-20 11:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00042-cc062316-2afd-49c2-9ad8-8709693b2986.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 12:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00048-9432d414-2aaa-424a-84b8-cd4364fa4e87.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1665 2020-05-20 12:17 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00049-a8c3f0f0-80f5-4690-a928-1f2108aa39df.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1656 2020-05-20 11:30 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00051-0c016684-cf71-4681-b1cd-fcb325452e89.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1825 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00063-2dc3d00d-46ed-41cc-b189-2ed475ed5c5c.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 12:20 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00065-70b9e314-8292-4e48-81c4-e3b983977563.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1629 2020-05-20 12:50 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00065-bfed91f6-1398-4038-aee7-56cb0cf87414.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 12:18 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00074-4beb1880-2bc0-4001-9684-546e240b6888.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1665 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00075-adbc8782-7b6f-4dbd-a8f8-e878648b1ff2.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 11:31 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00081-9e56a444-161f-43d8-9e50-bf24c6484d83.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1688 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00081-f246df73-8db5-49f4-9682-9a12bdeb0b5a.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1656 2020-05-20 11:30 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00083-8e0fdecb-8d0b-49d5-8e93-6edeee1539fc.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1656 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00092-b292d9ed-ce41-4426-833d-38f994af87d4.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1665 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00105-59bf04c1-b79f-42f1-995d-f3673486886d.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1823 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00108-00f0fc98-4e10-43c5-b5b3-9e0a10a7db03.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1737 2020-05-20 12:51 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00109-1389f070-e430-4246-95da-d2d4606b46ec.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1672 2020-05-20 12:20 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00109-b8e42728-ef8c-49d9-8451-aab55e3045cc.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00110-de37d04f-f26e-4a9c-872b-3b04ac8a188c.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1672 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00112-b06ee506-04c1-4969-bf12-069a9a88f222.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1825 2020-05-20 12:51 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00119-f36fc943-3f97-47ff-9502-a4dbcd69b591.c000.snappy.parquet
-rw-rw-r--   3 ravi.mondal supergroup       1584 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00124-bcfcbd1c-15aa-4410-b016-719715c8e775.c000.snappy.parquet
apache-spark parquet spark-structured-streaming
1个回答
0
投票

通过查看spark源代码,无法更改_spark_metadata目录的路径,作为参考,我在他们创建此目录的地方添加了git repo代码,并且该目录在指定路径内创建。

FileStreamSink Source Code

© www.soinside.com 2019 - 2024. All rights reserved.