...
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...
代码是 /user/spark3/conf/hive-site.xml 的一部分
起初的价值是
hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7
我改变了价值
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
下面是代码和结果
println(spark.conf.get("spark.sql.warehouse.dir")) //--Default : spark-warehouse
spark
.sql("""
SELECT
website,
avg(age) avg_age,
max(id) max_id
FROM
people a
JOIN
projects b
ON a.name = b.manager
WHERE
a.age > 11
GROUP BY
b.website
""")
.write
.mode("overwrite") //--Overwrite mode....
.saveAsTable("JoinedPeople") //--saveAsTable(<warehouse_table_name>)....
sql("SELECT * FROM JoinedPeople").show(1000)
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
| website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...| 30.0| 2|
|http://kafka.apac...| 19.0| 3|
|http://storm.apac...| 30.0| 9|
+--------------------+-------+------+
值 'spark.sql.warehouse.dir' 已根据需要将 kikang 更改为 skybluelee。
但是表“JoinedPeople”的位置没有改变。位置是 'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - hive-site.xml 中的第一个值
我想更改默认数据库的位置。
如何更改默认位置?
我更改了“spark-defaults.conf”,当然会关闭和打开 ubuntu。但没有效果
您可以检查一下您在这种情况下的 Spark 版本吗? 根据官方 Spark 文档中的Hive Tables:
请注意,自 Spark 2.0.0 以来,hive-site.xml 中的 hive.metastore.warehouse.dir 属性已被弃用。相反,使用 spark.sql.warehouse.dir 指定仓库中数据库的默认位置。您可能需要向启动 Spark 应用程序的用户授予写入权限。
更改
hive-site.xml
中的属性是否适合您(假设Spark版本高于2.0.0)?
...
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...
在 Spark 会话初始化之前设置属性对您有用吗?
import org.apache.spark.sql.SparkSession
val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"
// Create a SparkSession with the desired warehouse location
val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
// Import the necessary Spark functions and implicit
import spark.implicits._
import spark.sql
sql("""
SELECT
website,
avg(age) avg_age,
max(id) max_id
FROM
people a
JOIN
projects b
ON a.name = b.manager
WHERE
a.age > 11
GROUP BY
b.website
""")
.write
.mode("overwrite")
.saveAsTable("JoinedPeople")
// Retrieve the location of the "JoinedPeople" table from the Hive metastore
val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
println(s"Table location: $tableLocation")