Apache Drill查询执行计划不使用MongoDB索引

问题描述 投票:0回答:1

查询计划显示的是对mongo集合中所有行进行的集合扫描。因此,我在where子句列上创建了一个索引,期望Drill选择基于索引的访问计划。但是钻取继续使用全表扫描。要使钻取使用索引还有其他事情要做吗?

下面给出了实际查询,生成的查询计划和mongo索引。

SQL:

    Select j.user as User, TO_DATE(j.created_at) as submitted_on
    from mongo.example.jobs j
    where j.user = '[email protected]' and j.created_at BETWEEN timestamp '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'

物理计划(通过钻取UI)

    00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6720.59875 rows, 23532.19875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10468
    00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10467
    00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6587.2375 rows, 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
    00-03 Filter(condition=[AND(=($0, '[email protected]'), >=($1, 2020-03-25 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6466.0 rows, 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
    00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 3233.0, cumulative cost = {3233.0 rows, 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464

在MongoDB中创建的索引

{
"v" : 2,
"key" : { "user" : 1, "created_at" : 1, "method_map_id" : 1 },
"name" : "user_1_created_at_1_method_map_id_1",
"ns" : "example.jobs"
}

此外,在钻取文档中,我看到钻取仅支持MapR DB的索引。这是否意味着将不使用诸如mongo之类的其他数据源的索引?

https://drill.apache.org/docs/querying-indexes-introduction/

mongodb apache-drill query-planner
1个回答
0
投票

问题在于处理时间戳过滤器谓词的mongo-storage插件。筛选谓词将按给定顺序在以下模块中进行评估。

MongoPushDownFilterForScan-> MongoFilterBuilder-> MongoCompareFunctionProcessor.process()-> MongoCompareFunctionProcessor.visitSchemaPath()

visitSchemaPath方法的作用类似于值表达式类的getter方法。我看到没有TimestampExpression的处理程序。在下面添加了一段代码,对其进行了重建和测试。

if (valueArg instanceof TimeStampExpression) {
 Long unixseconds = ((TimeStampExpression) valueArg).getTimeStamp();
 this.value = new Date(unixseconds);
 this.path = path;
 return true;
 }

这使时间戳过滤器传递到mongo查询的过滤器部分。

© www.soinside.com 2019 - 2024. All rights reserved.