Spark异常“无法广播大于8GB的表”，'spark.sql.autoBroadcastJoinThreshold'：'-1'不起作用

Question

[在我们的Pyspark作业中，有一种情况是我们在较大的数据帧和相对较小的数据帧之间进行联接，我相信spark正在使用广播联接，并且遇到了以下错误

org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8GB: 8 GB
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:103)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:98)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more

我尝试通过将'spark.sql.autoBroadcastJoinThreshold'：'-1'设置为spark提交的一部分来禁用广播联接

/usr/bin/spark-submit --conf spark.sql.autoBroadcastJoinThreshold=-1 /home/hadoop/scripts/job.py

我尝试使用]打印spark.sql.autoBroadcastJoinThreshold的值>

spark.conf.get("spark.sql.autoBroadcastJoinThreshold")

并且返回-1。但是，即使进行了此更改，我仍然收到错误

org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8GB: 8 GB

Spark版本是Spark 2.3.0

感谢您的任何帮助。

在我们的Pyspark作业中，有一种情况是我们在较大数据帧和相对较小数据帧之间进行联接，我相信spark正在使用广播联接，因此我们遇到了...

Answer 1

您为什么不解释加入并查看实际计划？默认情况下，它将使用广播加入，如果禁用它，它将使用排序加入

Spark异常“无法广播大于8GB的表”，'spark.sql.autoBroadcastJoinThreshold'：'-1'不起作用

问题描述投票：0回答：1

1个回答

最新问题

Spark异常“无法广播大于8GB的表”，'spark.sql.autoBroadcastJoinThreshold'：'-1'不起作用

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1