Spark的Catalyst Optimizer如何选择物理计划？

Question

我试图了解 Spark 的 Catalyst 优化器如何选择最佳物理计划以及该过程中使用的成本函数是什么。

我确实了解它的作用以及它的使用方式，但我想知道优化器如何进行优化，并根据它生成不同的物理计划并选择最佳的计划，因为我找不到任何成本资源计划

Answer 1

如果您一般询问优化器，那么有许多有用的搜索结果可以解释这一点（例如phases，experimental extensions），可以在找到一些非常具体的自定义规则以及使用SparkExtension 注册它们以供一般用途。

质量自定义规则让 Spark 知道可以通过首先转换 uuid（与行 id 类似）来搜索通过连接两个长整型创建的 uuid 字符串，从而加快计算速度。这会停止表扫描并允许谓词下推。

那些不需要运行查询，只需要查询结构本身，与谓词下推等相同。选择的确切阶段取决于内置计划已经执行的优化级别，上面的自定义阶段发生在谓词下推逻辑执行之前放置，以便其结果可以被下推。

/**
 * A root node to execute the query plan adaptively. It splits the query plan into independent
 * stages and executes them in order according to their dependencies. The query stage
 * materializes its output at the end. When one stage completes, the data statistics of the
 * materialized output will be used to optimize the remainder of the query.
 *
 * To create query stages, we traverse the query tree bottom up. When we hit an exchange node,
 * and if all the child query stages of this exchange node are materialized, we create a new
 * query stage for this exchange node. The new stage is then materialized asynchronously once it
 * is created.
 *
 * When one query stage finishes materialization, the rest query is re-optimized and planned based
 * on the latest statistics provided by all materialized stages. Then we traverse the query plan
 * again and create more stages if possible. After all stages have been materialized, we execute
 * the rest of the plan.
 */

您“计划”执行的优化将取决于您需要做什么。不过，我建议您认真考虑一下付出的努力是否值得。

Spark的Catalyst Optimizer如何选择物理计划？

问题描述投票：0回答：1

1个回答

最新问题

Spark的Catalyst Optimizer如何选择物理计划？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1