我在我的 win11 笔记本电脑上构建了一个 postgresql,使用 pgadmin 4 运行了
select
,并使用 VSCode 运行了 SQLAlchemy。 SQL 语句是相同的。然而,SQLAlchemy 比国外的 SQL 更快。我只是想知道为什么?
我尝试在cmd上运行ORM和SQL,结果仍然相同。 我使用的 SQL 语句是生成的
stmt
ORM。
这是我使用的orm代码
from sqlalchemy import create_engine, Table, MetaData, Column, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import select
import time
db_url = "postgresql://postgres:Bmpsst521@localhost:5432/Teacher_salary"
engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()
meta_obj = MetaData()
log_problem = Table(
"log_problem",
meta_obj,
Column("timestamp_tw"),
Column("uuid"),
Column("ucid"),
Column("upid"),
Column("problem_number"),
Column("exercise_problem_repeat_session"),
Column("is_correct"),
Column("total_sec_taken"),
Column("total_attempt_cnt"),
Column("used_hint_cnt"),
Column("is_hint_used"),
Column("is_downgrade"),
Column("is_upgrade"),
Column("level"),
Column("id")
)
info_userdata = Table(
"info_userdata",
meta_obj,
Column("uuid"),
Column("gender"),
Column("points"),
Column("badges_cnt"),
Column("first_login_date_tw"),
Column("user_grade"),
Column("user_city"),
Column("has_teacher_cnt"),
Column("is_self_coach"),
Column("has_student_cnt"),
Column("belongs_to_class_cnt"),
Column("has_class_cnt"),
Column("id")
)
以下是SQLAlchemy的生成方式以及我的使用方式,算算时间。
stmt = select(
info_userdata.c.user_grade,
func.count()
).select_from(log_problem).join(
info_userdata, log_problem.c.uuid==info_userdata.c.uuid
).group_by(info_userdata.c.user_grade)
print(stmt)
start = time.time()
with engine.connect() as conn:
result = conn.execute(stmt)
conn.commit()
end = time.time()
for row in result:
print(row)
print(end - start) # 3.106472969055176 (in second)
这是 SQL 语句。
SELECT info_userdata.user_grade, count(*) AS count_1
FROM log_problem
JOIN info_userdata
ON log_problem.uuid = info_userdata.uuid
GROUP BY info_userdata.user_grade;
--3582.067 ms (00:03.582) (using \timing ON)
添加查询计划
Teacher_salary=# EXPLAIN SELECT info_userdata.user_grade, count(*) AS count_1 FROM log_problem JOIN info_userdata ON log_problem.uuid = info_userdata.uuid GROUP BY info_userdata.user_grade;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=661356.65..661359.69 rows=12 width=12)
Group Key: info_userdata.user_grade
-> Gather Merge (cost=661356.65..661359.45 rows=24 width=12)
Workers Planned: 2
-> Sort (cost=660356.62..660356.65 rows=12 width=12)
Sort Key: info_userdata.user_grade
-> Partial HashAggregate (cost=660356.29..660356.41 rows=12 width=12)
Group Key: info_userdata.user_grade
-> Parallel Hash Join (cost=2226.98..623588.15 rows=7353627 width=4)
Hash Cond: ((log_problem.uuid)::text = (info_userdata.uuid)::text)
-> Parallel Seq Scan on log_problem (cost=0.00..550528.27 rows=7353627 width=45)
-> Parallel Hash (cost=1691.99..1691.99 rows=42799 width=49)
-> Parallel Seq Scan on info_userdata (cost=0.00..1691.99 rows=42799 width=49)
(13 筆資料)
時間: 7.960 ms
添加来自
EXPLAIN(analyze, verbose, buffers, settings)
的结果
Teacher_salary=# EXPLAIN(analyze, verbose, buffers, settings) SELECT info_userdata.user_grade, count(*) AS count_1 FROM log_problem JOIN info_userdata ON log_problem.uuid = info_userdata.uuid GROUP BY info_userdata.user_grade;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=661356.65..661359.69 rows=12 width=12) (actual time=4309.362..4313.947 rows=12 loops=1)
Output: info_userdata.user_grade, count(*)
Group Key: info_userdata.user_grade
Buffers: shared hit=3776 read=474528
-> Gather Merge (cost=661356.65..661359.45 rows=24 width=12) (actual time=4309.357..4313.937 rows=36 loops=1)
Output: info_userdata.user_grade, (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=3776 read=474528
-> Sort (cost=660356.62..660356.65 rows=12 width=12) (actual time=4273.242..4273.244 rows=12 loops=3)
Output: info_userdata.user_grade, (PARTIAL count(*))
Sort Key: info_userdata.user_grade
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3776 read=474528
Worker 0: actual time=4255.885..4255.887 rows=12 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1014 read=171520
Worker 1: actual time=4255.106..4255.108 rows=12 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=638 read=167504
-> Partial HashAggregate (cost=660356.29..660356.41 rows=12 width=12) (actual time=4273.223..4273.226 rows=12 loops=3)
Output: info_userdata.user_grade, PARTIAL count(*)
Group Key: info_userdata.user_grade
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=3762 read=474528
Worker 0: actual time=4255.863..4255.865 rows=12 loops=1
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=1007 read=171520
Worker 1: actual time=4255.083..4255.087 rows=12 loops=1
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=631 read=167504
-> Parallel Hash Join (cost=2226.98..623588.15 rows=7353627 width=4) (actual time=11.651..3448.484 rows=5405770 loops=3)
Output: info_userdata.user_grade
Hash Cond: ((log_problem.uuid)::text = (info_userdata.uuid)::text)
Buffers: shared hit=3762 read=474528
Worker 0: actual time=0.374..3460.534 rows=5865102 loops=1
Buffers: shared hit=1007 read=171520
Worker 1: actual time=0.455..3415.051 rows=5715842 loops=1
Buffers: shared hit=631 read=167504
-> Parallel Seq Scan on public.log_problem (cost=0.00..550528.27 rows=7353627 width=45) (actual time=0.254..1568.013 rows=5405770 loops=3)
Output: log_problem.uuid
Buffers: shared hit=2464 read=474528
Worker 0: actual time=0.294..1579.935 rows=5865102 loops=1
Buffers: shared hit=990 read=171520
Worker 1: actual time=0.389..1556.809 rows=5715842 loops=1
Buffers: shared hit=614 read=167504
-> Parallel Hash (cost=1691.99..1691.99 rows=42799 width=49) (actual time=11.122..11.123 rows=24253 loops=3)
Output: info_userdata.user_grade, info_userdata.uuid
Buckets: 131072 Batches: 1 Memory Usage: 7296kB
Buffers: shared hit=1264
Worker 0: actual time=0.014..0.015 rows=0 loops=1
Worker 1: actual time=0.013..0.014 rows=0 loops=1
-> Parallel Seq Scan on public.info_userdata (cost=0.00..1691.99 rows=42799 width=49) (actual time=0.007..11.095 rows=72758 loops=1)
Output: info_userdata.user_grade, info_userdata.uuid
Buffers: shared hit=1264
Planning Time: 0.340 ms
Execution Time: 4314.031 ms
(57 筆資料)
看起来您将 UUID 存储为文本,而不是 UUID。这已经改变/改善了事情:
((log_problem.uuid)::text = (info_userdata.uuid)::text)
将两者更改为 UUID 数据类型:
ALTER TABLE log_problem ALTER COLUMN uuid TYPE uuid USING (cast(uuid AS uuid));
ALTER TABLE info_userdata ALTER COLUMN uuid TYPE uuid USING (cast(uuid AS uuid));
这些列上的索引也可能很有用。您可能还想包含
user_grade
,以便它涵盖查询中的所有条件。