期货在[5秒]后超时

问题描述 投票:0回答:3

在具有以下配置的作业集群上:

Driver: Standard_E8ds_v5 
Workers: Standard_E8ds_v5
30 workers 
11.3 LTS Photon (includes Apache Spark 3.3.0, Scala 2.12)

我们大约有 5% 的时间会遇到

Futures timed out after [5 seconds]
错误,堆栈跟踪显示在底部。我希望堆栈跟踪足以让某人告诉我应该调整哪些 Spark 配置来延长这 5 秒超时。

该工作的笔记本是这样做的:

def RunChild(s):
  dbutils.notebook.run("./ProcessChild", 0, {"param": s})

scenarios = [ some array with 107 items]
with ThreadPoolExecutor(max_workers=20) as executor:
   final = executor.map(RunChild, scenarios)

ProcessChild 笔记本每次在 Spark 代码的不同位上频繁失败,并出现以下堆栈跟踪和错误:

java.util.concurrent.TimeoutException: Futures timed out after [5 seconds] 
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
      at scala.concurrent.Await$.$anonfun$result$1(package.scala:223)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
      at scala.concurrent.Await$.result(package.scala:146)
      at com.databricks.backend.daemon.driver.JupyterDriverLocal$RequestStatus.waitForReply(JupyterDriverLocal.scala:209)
      at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:971)
      at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$23(DriverLocal.scala:725)
      at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:103)
      at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$20(DriverLocal.scala:708)
      at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:398)
      at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
      at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
      at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:396)
      at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:393)
      at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:62)
      at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:441)
      at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:426)
      at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:62)
      at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:685)
      at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:622)
      at scala.util.Try$.apply(Try.scala:213)
      at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:614)
      at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:533)
      at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:568)
      at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:438)
      at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:381)
      at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:232)
      at java.lang.Thread.run(Thread.java:750)

集群肯定忙于所有并行线程和操作,我们想知道哪个 Spark 配置可以延长 5 秒超时。

databricks azure-databricks
3个回答
0
投票

我们也遇到了同样的问题,但更令人烦恼的是,在我们的实例中,我们在 10.4LTS 上运行得非常好。简单地升级运行时并重新触发会导致 Futures 在五秒后超时。

对于我们来说,我们能够使用以下命令将广播加入超时从 -1000 增加到 300000(5 分钟)。

spark.conf.get("spark.sql.broadcastTimeout")
spark.conf.set("spark.sql.broadcastTimeout",  '300000ms')

这必须是一个临时解决方案(至少对我们来说哈哈),但希望它能让您摆脱眼前的麻烦。我们最终还增加了集群的大小以帮助使用内存。

鉴于这是由广播连接引起的,作为我们测试的一部分,我们还能够禁用自动广播连接,看看这是否有帮助(确实有帮助,但很不稳定):

spark.conf.set("spark.sql.autoBroadcastJoinThreshold",  -1)

参考 https://spark.apache.org/docs/latest/sql-performance-tuning.html


0
投票

清除笔记本的状态对我有用。


0
投票

如果存在一些不需要的显示或任何其他不需要的操作,请将其删除并清除状态。这对我有用

© www.soinside.com 2019 - 2024. All rights reserved.