如何更改仓库默认数据库的位置?(spark)

问题描述 投票:0回答:1
    ...
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
        <description>location of default database for the warehouse</description>
    </property>
    ...

代码是 /user/spark3/conf/hive-site.xml 的一部分

起初的价值是

hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7

我改变了价值

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7

下面是代码和结果

println(spark.conf.get("spark.sql.warehouse.dir"))  //--Default : spark-warehouse

spark
    .sql("""
        SELECT 
            website, 
            avg(age) avg_age, 
            max(id) max_id
        FROM 
            people a 
            JOIN 
            projects b 
            ON a.name = b.manager 
        WHERE 
            a.age > 11 
        GROUP BY 
            b.website
        """)
    .write
    .mode("overwrite")  //--Overwrite mode....
    .saveAsTable("JoinedPeople")  //--saveAsTable(<warehouse_table_name>)....
    
sql("SELECT * FROM JoinedPeople").show(1000)

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
|             website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...|   30.0|     2|
|http://kafka.apac...|   19.0|     3|
|http://storm.apac...|   30.0|     9|
+--------------------+-------+------+

值 'spark.sql.warehouse.dir' 已根据需要将 kikang 更改为 skybluelee。

但是表“JoinedPeople”的位置没有改变。位置是 'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - hive-site.xml 中的第一个值

我想更改默认数据库的位置。

如何更改默认位置?

我更改了“spark-defaults.conf”,当然会关闭和打开 ubuntu。但没有效果

apache-spark apache-spark-sql hive
1个回答
0
投票

您可以检查一下您在这种情况下的 Spark 版本吗? 根据官方 Spark 文档中的Hive Tables

请注意,自 Spark 2.0.0 以来,hive-site.xml 中的 hive.metastore.warehouse.dir 属性已被弃用。相反,使用 spark.sql.warehouse.dir 指定仓库中数据库的默认位置。您可能需要向启动 Spark 应用程序的用户授予写入权限。

  1. 更改

    hive-site.xml
    中的属性是否适合您(假设Spark版本高于2.0.0)?

     ...
     <property>
         <name>spark.sql.warehouse.dir</name>
         <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
         <description>location of default database for the warehouse</description>
     </property>
     ...
    
  2. 在 Spark 会话初始化之前设置属性对您有用吗?

    import org.apache.spark.sql.SparkSession
    
    val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"
    
    // Create a SparkSession with the desired warehouse location
    val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()
    
    // Import the necessary Spark functions and implicit
    import spark.implicits._
    import spark.sql
    sql("""
     SELECT 
         website, 
         avg(age) avg_age, 
         max(id) max_id
     FROM 
         people a 
         JOIN 
         projects b 
         ON a.name = b.manager 
     WHERE 
         a.age > 11 
     GROUP BY 
         b.website
     """)
       .write
       .mode("overwrite")
       .saveAsTable("JoinedPeople")
    
    // Retrieve the location of the "JoinedPeople" table from the Hive metastore
    val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
    println(s"Table location: $tableLocation")
    
    
    
    
© www.soinside.com 2019 - 2024. All rights reserved.