SQLite 两个表和两个数据库之间的多重 JOIN 问题

问题描述 投票:0回答:1

我正在编写一个开源纯 C 程序,该程序使用校验和检查文件的完整性,比较来自不同来源的两个结构相同的 SQLite 数据库 db1db2

每个数据库都有两个表:第一个“files”包含相对文件路径“relative_path”、校验和“sha”以及路径前缀标识符“path_prefix_index”。

CREATE TABLE "files" (
    "ID"    INTEGER NOT NULL,
    "path_prefix_index" INTEGER NOT NULL,
    "relative_path" TEXT NOT NULL,
    "sha"   BLOB DEFAULT NULL,
    PRIMARY KEY("ID"),
    CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL
);
CREATE INDEX full_path_ASC ON files (path_prefix_index, relative_path ASC);

第二个表“paths”包含路径前缀“path”及其ID

CREATE TABLE "paths" (
    "ID"    INTEGER NOT NULL UNIQUE,
    "path"  TEXT NOT NULL UNIQUE,
    PRIMARY KEY("ID")
);

可以猜到(基于 CONSTRAINT 和索引),一对唯一的绝对文件路径由 paths.pathfiles.relative_path 组成。

让我们用测试数据填充第一个数据库 db1 中的表:

INSERT INTO files VALUES(1,1,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files VALUES(2,1,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(3,1,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files VALUES(4,2,'AAA/BCB/CCC/a.txt',X'856a');
INSERT INTO files VALUES(5,2,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(6,2,'AAA/ZAW/D/e/f/b_file.txt',X'856a');

INSERT INTO paths VALUES(1,'/mnt/path1');
INSERT INTO paths VALUES(2,'/mnt/path2');

之后,让我们用相同的数据填充第二个数据库db2,但具有不同的索引。需要注意的是,“paths”表中的路径前缀有不同的索引,以及“files”表中对应的“path_prefix_index”字段。尽管索引存在差异,但会形成完全相同的唯一对前缀和相对路径,从而创建文件的绝对路径。示例中的两个表包含相同的文件(当考虑绝对路径的角度时)。

现在到了需要注意的部分。在上面的示例中,有一个文件的“sha”校验和在不同数据库之间有所不同。

INSERT INTO paths VALUES(4,'/mnt/path1');
INSERT INTO paths VALUES(3,'/mnt/path2');

INSERT INTO files VALUES(1,4,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files VALUES(2,4,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(3,4,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files VALUES(4,3,'AAA/BCB/CCC/a.txt',X'846a'); -- The exactly same file that has a different sha;
INSERT INTO files VALUES(5,3,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(6,3,'AAA/ZAW/D/e/f/b_file.txt',X'856a');

我需要创建一个查询,允许 SQLite 显示一对完全相同的文件的路径和 path_prefix_index (总共是文件的绝对路径),而该文件的 sha 字段与两个数据库不匹配。

很长一段时间以来,我一直在尝试解决多重 JOIN 问题,但我的 SQL 知识不允许达到所需的结果。这是我设法建造的怪物的示例。不幸的是,这不起作用:

sqlite> attach database "example1.db" as db1; sqlite> attach database "example2.db" as db2; sqlite> SELECT p.path, f1.relative_path FROM db1.files AS f1 JOIN db1.paths AS p ON f1.path_prefix_index = p.ID JOIN db2.files AS f2 ON f1.relative_path = f2.relative_path JOIN db2.paths AS p2 ON f2.path_prefix_index = p2.ID WHERE f1.sha <> f2.sha; /mnt/path1|AAA/BCB/CCC/a.txt /mnt/path2|AAA/BCB/CCC/a.txt /mnt/path2|AAA/BCB/CCC/a.txt
一切看起来都很棒,但仍然无法正常工作。在此示例中,只有一个唯一文件具有不同的 

sha,并且它是

/mnt/path2|AAA/BCB/CCC/a.txt
我请求帮助理解和构建 SQL 查询,这样比较两个数据库的应用程序将能够打印出所需的文件绝对路径。

sql sqlite join
1个回答
0
投票
如果添加 * 以显示所有列,例如

SELECT p.path, f1.relative_path,*

然后你会看到有多行匹配WHERE子句:-

例如

  • 请参阅下面有关十六进制数据的 SQLite 工具

  • 删除 WHERE 子句,就会有 12 行,即您正在处理连接行的笛卡尔积。

假设文件指的是单个相对路径,那么您可以通过添加

GROUP BY f1.relative_path1

 将 3 减少为 1

例如

SELECT p.path, f1.relative_path FROM db1.files AS f1 JOIN db1.paths AS p ON f1.path_prefix_index = p.ID JOIN db2.files AS f2 ON f1.relative_path = f2.relative_path JOIN db2.paths AS p2 ON f2.path_prefix_index = p2.ID WHERE f1.sha <> f2.sha GROUP BY f1.relative_path1;
但是是吗

    /mnt/path1 ..../mnt/path2,或
  • /mnt/path2 ..../mnt/path2,或
  • /mnt/path2 ..../mnt/path1
你想要的,如果这很重要的话?

建议

我建议考虑使用 SQLite 工具(SQliteStudion、Navicat for SQLite、DBeaver ....),因为它可以使使用 SQL 变得更容易。

例如,对于上述(以及更多),Navicat 与单个数据库一起使用,以消除附加的复杂性(工作正常)。

这是测试代码的一种排列:-

DROP TABLE IF EXISTS files; DROP TABLE IF EXISTS paths; DROP TABLE IF EXISTS files2; DROP TABLE IF EXISTS paths2; CREATE TABLE IF NOT EXISTS "files" ( "ID" INTEGER NOT NULL, "path_prefix_index" INTEGER NOT NULL, "relative_path" TEXT NOT NULL, "sha" BLOB DEFAULT NULL, PRIMARY KEY("ID"), CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL ); CREATE INDEX IF NOT EXISTS full_path_ASC ON files (path_prefix_index, relative_path ASC); CREATE TABLE "paths" ( "ID" INTEGER NOT NULL UNIQUE, "path" TEXT NOT NULL UNIQUE, PRIMARY KEY("ID") ); CREATE TABLE IF NOT EXISTS "files2" ( "ID" INTEGER NOT NULL, "path_prefix_index" INTEGER NOT NULL, "relative_path" TEXT NOT NULL, "sha" BLOB DEFAULT NULL, PRIMARY KEY("ID"), CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL ); CREATE INDEX IF NOT EXISTS full_path_ASC ON files2 (path_prefix_index, relative_path ASC); CREATE TABLE "paths2" ( "ID" INTEGER NOT NULL UNIQUE, "path" TEXT NOT NULL UNIQUE, PRIMARY KEY("ID") ); INSERT INTO files VALUES(1,1,'AAA/BCB/CCC/a.txt',X'f90d'); INSERT INTO files VALUES(2,1,'AAA/ZAW/A/b/c/a_file.txt',X'16b7'); INSERT INTO files VALUES(3,1,'AAA/ZAW/D/e/f/b_file.txt',X'856a'); INSERT INTO files VALUES(4,2,'AAA/BCB/CCC/a.txt',X'856a'); INSERT INTO files VALUES(5,2,'AAA/ZAW/A/b/c/a_file.txt',X'16b7'); INSERT INTO files VALUES(6,2,'AAA/ZAW/D/e/f/b_file.txt',X'856a'); INSERT INTO paths VALUES(1,'/mnt/path1'); INSERT INTO paths VALUES(2,'/mnt/path2'); INSERT INTO paths2 VALUES(4,'/mnt/path1'); INSERT INTO paths2 VALUES(3,'/mnt/path2'); INSERT INTO files2 VALUES(1,4,'AAA/BCB/CCC/a.txt',X'f90d'); INSERT INTO files2 VALUES(2,4,'AAA/ZAW/A/b/c/a_file.txt',X'16b7'); INSERT INTO files2 VALUES(3,4,'AAA/ZAW/D/e/f/b_file.txt',X'856a'); INSERT INTO files2 VALUES(4,3,'AAA/BCB/CCC/a.txt',X'846a'); -- The exactly same file that has a different sha; INSERT INTO files2 VALUES(5,3,'AAA/ZAW/A/b/c/a_file.txt',X'16b7'); INSERT INTO files2 VALUES(6,3,'AAA/ZAW/D/e/f/b_file.txt',X'856a'); SELECT p.path, f1.relative_path,f1.sha <> f2.sha AS comprslt, hex(f1.sha) AS h1, hex(f2.sha) AS h2, * /* to show ALL data of rows that match the criteria */ FROM files AS f1 JOIN paths AS p ON f1.path_prefix_index = p.ID JOIN files2 AS f2 ON f1.relative_path = f2.relative_path JOIN paths2 AS p2 ON f2.path_prefix_index = p2.ID WHERE f1.sha <> f2.sha /*GROUP BY f1.relative_path*/; DROP TABLE IF EXISTS files; DROP TABLE IF EXISTS paths; DROP TABLE IF EXISTS files2; DROP TABLE IF EXISTS paths2;

    大量复制和粘贴,但表名称已更改以适合单个数据库。
  • 添加了各种栏目,以便更容易理解什么是什么。
  • GROUP BY 子句被注释掉(只需取消注释)
以上产生:-

    添加突出显示的列以查看比较(WHERE 子句)的情况。通常工具不能很好地处理斑点。
    • 所以比较结果本身(如果使用 WHERE 子句,显然总是 1)
    • 内置十六进制函数用于以可读/有用的格式显示 blob 数据。
  • 可以看出,注释/取消注释 SQL 部分非常容易
  • 最后的水滴用于清理环境
© www.soinside.com 2019 - 2024. All rights reserved.