[在带有Postgres的数据源中使用Exposed时不使用索引

问题描述 投票:3回答:2

[当与数据源一起使用暴露对象时,我遇到了意外的行为(我尝试使用Apache DBCP和HikariCP)。

设置:具有testid字段且索引位于flag的单个表(flag)。

查询:SELECT * from test where flag=1 limit 1;

手动运行时,将使用索引,查询速度很快。当通过暴露重复运行时,调用9次后,性能会下降。不再使用索引-请参阅下面的查询计划。

这里是示例代码:


object TestTable : IntIdTable() {
    val flag = integer("flag").index()
}

fun createNRows(n: Int) = repeat(n) {
    TestTable.insert { it[flag] = 0 }
}

fun main(args: Array<String>) {
    val ds = HikariDataSource(HikariConfig().apply {
        jdbcUrl = "jdbc:postgresql://localhost:5432/testdb"
        username = ...
        password = ...
        setDriverClassName("org.postgresql.Driver")
    })

    Database.connect(ds)


    transaction {
        // only run the first time:
        // SchemaUtils.create(TestTable)
        // createNRows(1000000) 
        println("total ${TestTable.selectAll().count()} elements")
    }

    repeat(10000) {
        transaction {
            val startedAt = System.currentTimeMillis()
            TestTable.select { TestTable.flag.eq(1) }.limit(1).toList()
            println("Query took ${System.currentTimeMillis() - startedAt}")
        }
    }
}

输出:

total 1000000 elements
Query took 6
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 0
Query took 79
Query took 64
Query took 63
Query took 62
Query took 63
....

以下是启用了EXPLAIN (ANALYZE, BUFFERS)的postgres日志:

这是最后一个快速查询:

2020-03-10 23:03:00.596 CET [71012] LOG:  duration: 0.021 ms  bind S_2: 
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.083 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.013 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.025 ms
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.011 ms  bind S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.159 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.028 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.015 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
      Buffers: shared hit=3
      ->  Index Scan using test_flag on test  (cost=0.42..4.44 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=1)
            Index Cond: (flag = 1)
            Buffers: shared hit=3
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.072 ms  bind S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.017 ms
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.022 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.007 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.013 ms

这是第一个“慢”的人:

2020-03-10 23:03:01.601 CET [71012] LOG:  duration: 0.022 ms  bind S_2: 
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.052 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.011 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.023 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.012 ms  bind S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.192 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.678 CET [71012] LOG:  duration: 75.889 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 75.868 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.00..0.02 rows=1 width=8) (actual time=75.864..75.864 rows=0 loops=1)
      Buffers: shared hit=96 read=4329
      ->  Seq Scan on test  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=75.862..75.862 rows=0 loops=1)
            Filter: (flag = $1)
            Rows Removed by Filter: 1000000
            Buffers: shared hit=96 read=4329
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.054 ms  bind S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.014 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.025 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.004 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.009 ms

Postgres版本(自制):

postgres (PostgreSQL) 11.5

客户端版本:


dependencies {
    implementation 'org.jetbrains.exposed:exposed:0.17.7'
    implementation "org.postgresql:postgresql:42.2.8"
    implementation 'org.jetbrains.kotlin:kotlin-stdlib'
    implementation 'com.zaxxer:HikariCP:2.3.2'
}

postgres配置是默认的(日志是通过自动解释生成的,但是问题会在没有它的情况下重现)

这里是示例的来源:https://github.com/RomanBrodetski/kotlin-exposed-issue

观察:

  • 如果删除.limit(1),则不会重现该问题
  • 如果未使用数据源(Database.connect("jdbc:postgresql://localhost:5432/testdb", driver = "org.postgresql.Driver")而不是Database.connect(ds),则不会重现此问题
  • 如果交易中还有其他声明,则不会复制该问题。
postgresql jdbc datasource kotlin-exposed
2个回答
1
投票

完全删除.limit(1)使其始终使用索引。问题是在几次(5)执行错误之后为预准备语句创建的总体计划。limit 1就是这样。使1成为绑定变量可以解决该问题。不幸的是,我没有在Exposed库中找到实现此目的的方法-它将数字内联到准备好的语句中。

出于某种原因,它认为它可以在顺序扫描期间立即找到匹配的行,而且无论我执行什么真空/分析/创建统计信息,我都无法改变主意。 (我尝试更改标志值的分布,没有帮助)

从SQL复制问题:

create index test_flag_partial_idx on test (flag) include (id) where flag is not null and flag = 1;

vacuum  full  analyse  test;

PREPARE select_with_limit_as_value AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT 1;
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);


PREPARE select_with_limit_as_bind AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT $2;
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);

第一个准备好的语句使用limit作为硬编码值,并且在执行几次后切换到使用顺序扫描的一般计划。第二条准备好的语句使用limit作为绑定变量,而总体计划使用index。

您需要将标志参数硬编码到查询中或使限制成为绑定变量。

在PostgreSQL 12中,您可以禁用通用计划,可以在查询前后更改它:

set plan_cache_mode = force_custom_plan;

全部在PostgreSQL 12.2上尝试过]


0
投票

请在插入数据后尝试收集表统计信息。看起来CBO的统计信息较少,无法理解表结构。实际上,对于Postgres not使用您创建的索引并不是一个坏主意,因为索引的所有值都相同。因此,接下来要尝试的是从代码中删除索引,或创建更好的索引。

最后,它似乎与Exposed无关,但与Postgresql本身有关。

((我想发表评论,但由于我的声誉而无法实现,因此写了一个答案)

© www.soinside.com 2019 - 2024. All rights reserved.