我正在编写一个开源纯 C 程序,该程序使用校验和检查文件的完整性,比较来自不同来源的两个结构相同的 SQLite 数据库 db1 和 db2。
每个数据库都有两个表:第一个“files”包含相对文件路径“relative_path”、校验和“sha”以及路径前缀标识符“path_prefix_index”。
CREATE TABLE "files" (
"ID" INTEGER NOT NULL,
"path_prefix_index" INTEGER NOT NULL,
"relative_path" TEXT NOT NULL,
"sha" BLOB DEFAULT NULL,
PRIMARY KEY("ID"),
CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL
);
CREATE INDEX full_path_ASC ON files (path_prefix_index, relative_path ASC);
第二个表“paths”包含路径前缀“path”及其ID。
CREATE TABLE "paths" (
"ID" INTEGER NOT NULL UNIQUE,
"path" TEXT NOT NULL UNIQUE,
PRIMARY KEY("ID")
);
可以猜到(基于 CONSTRAINT 和索引),一对唯一的绝对文件路径由 paths.path 和 files.relative_path 组成。
让我们用测试数据填充第一个数据库 db1 中的表:
INSERT INTO files VALUES(1,1,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files VALUES(2,1,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(3,1,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files VALUES(4,2,'AAA/BCB/CCC/a.txt',X'856a');
INSERT INTO files VALUES(5,2,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(6,2,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO paths VALUES(1,'/mnt/path1');
INSERT INTO paths VALUES(2,'/mnt/path2');
之后,让我们用相同的数据填充第二个数据库db2,但具有不同的索引。需要注意的是,“paths”表中的路径前缀有不同的索引,以及“files”表中对应的“path_prefix_index”字段。尽管索引存在差异,但会形成完全相同的唯一对前缀和相对路径,从而创建文件的绝对路径。示例中的两个表包含相同的文件(当考虑绝对路径的角度时)。
现在到了需要注意的部分。在上面的示例中,有一个文件的“sha”校验和在不同数据库之间有所不同。
INSERT INTO paths VALUES(4,'/mnt/path1');
INSERT INTO paths VALUES(3,'/mnt/path2');
INSERT INTO files VALUES(1,4,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files VALUES(2,4,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(3,4,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files VALUES(4,3,'AAA/BCB/CCC/a.txt',X'846a'); -- The exactly same file that has a different sha;
INSERT INTO files VALUES(5,3,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(6,3,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
我需要创建一个查询,允许 SQLite 显示一对完全相同的文件的路径和 path_prefix_index (总共是文件的绝对路径),而该文件的 sha 字段与两个数据库不匹配。
很长一段时间以来,我一直在尝试解决多重 JOIN 问题,但我的 SQL 知识不允许达到所需的结果。这是我设法建造的怪物的示例。不幸的是,这不起作用:
sqlite> attach database "example1.db" as db1;
sqlite> attach database "example2.db" as db2;
sqlite> SELECT p.path, f1.relative_path
FROM db1.files AS f1
JOIN db1.paths AS p ON f1.path_prefix_index = p.ID
JOIN db2.files AS f2 ON f1.relative_path = f2.relative_path
JOIN db2.paths AS p2 ON f2.path_prefix_index = p2.ID
WHERE f1.sha <> f2.sha;
/mnt/path1|AAA/BCB/CCC/a.txt
/mnt/path2|AAA/BCB/CCC/a.txt
/mnt/path2|AAA/BCB/CCC/a.txt
一切看起来都很棒,但仍然无法正常工作。在此示例中,只有一个唯一文件具有不同的 sha,并且它是
/mnt/path2|AAA/BCB/CCC/a.txt
我请求帮助理解和构建 SQL 查询,这样比较两个数据库的应用程序将能够打印出所需的文件绝对路径。
SELECT p.path, f1.relative_path,*
然后你会看到有多行匹配WHERE子句:-例如
GROUP BY f1.relative_path1
SELECT p.path, f1.relative_path
FROM db1.files AS f1
JOIN db1.paths AS p ON f1.path_prefix_index = p.ID
JOIN db2.files AS f2 ON f1.relative_path = f2.relative_path
JOIN db2.paths AS p2 ON f2.path_prefix_index = p2.ID
WHERE f1.sha <> f2.sha
GROUP BY f1.relative_path1;
但是是吗
建议
我建议考虑使用 SQLite 工具(SQliteStudion、Navicat for SQLite、DBeaver ....),因为它可以使使用 SQL 变得更容易。例如,对于上述(以及更多),Navicat 与单个数据库一起使用,以消除附加的复杂性(工作正常)。
这是测试代码的一种排列:-
DROP TABLE IF EXISTS files;
DROP TABLE IF EXISTS paths;
DROP TABLE IF EXISTS files2;
DROP TABLE IF EXISTS paths2;
CREATE TABLE IF NOT EXISTS "files" (
"ID" INTEGER NOT NULL,
"path_prefix_index" INTEGER NOT NULL,
"relative_path" TEXT NOT NULL,
"sha" BLOB DEFAULT NULL,
PRIMARY KEY("ID"),
CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL
);
CREATE INDEX IF NOT EXISTS full_path_ASC ON files (path_prefix_index, relative_path ASC);
CREATE TABLE "paths" (
"ID" INTEGER NOT NULL UNIQUE,
"path" TEXT NOT NULL UNIQUE,
PRIMARY KEY("ID")
);
CREATE TABLE IF NOT EXISTS "files2" (
"ID" INTEGER NOT NULL,
"path_prefix_index" INTEGER NOT NULL,
"relative_path" TEXT NOT NULL,
"sha" BLOB DEFAULT NULL,
PRIMARY KEY("ID"),
CONSTRAINT "full_path" UNIQUE("path_prefix_index","relative_path") ON CONFLICT FAIL
);
CREATE INDEX IF NOT EXISTS full_path_ASC ON files2 (path_prefix_index, relative_path ASC);
CREATE TABLE "paths2" (
"ID" INTEGER NOT NULL UNIQUE,
"path" TEXT NOT NULL UNIQUE,
PRIMARY KEY("ID")
);
INSERT INTO files VALUES(1,1,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files VALUES(2,1,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(3,1,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files VALUES(4,2,'AAA/BCB/CCC/a.txt',X'856a');
INSERT INTO files VALUES(5,2,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files VALUES(6,2,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO paths VALUES(1,'/mnt/path1');
INSERT INTO paths VALUES(2,'/mnt/path2');
INSERT INTO paths2 VALUES(4,'/mnt/path1');
INSERT INTO paths2 VALUES(3,'/mnt/path2');
INSERT INTO files2 VALUES(1,4,'AAA/BCB/CCC/a.txt',X'f90d');
INSERT INTO files2 VALUES(2,4,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files2 VALUES(3,4,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
INSERT INTO files2 VALUES(4,3,'AAA/BCB/CCC/a.txt',X'846a'); -- The exactly same file that has a different sha;
INSERT INTO files2 VALUES(5,3,'AAA/ZAW/A/b/c/a_file.txt',X'16b7');
INSERT INTO files2 VALUES(6,3,'AAA/ZAW/D/e/f/b_file.txt',X'856a');
SELECT p.path, f1.relative_path,f1.sha <> f2.sha AS comprslt, hex(f1.sha) AS h1, hex(f2.sha) AS h2, * /* to show ALL data of rows that match the criteria */
FROM files AS f1
JOIN paths AS p ON f1.path_prefix_index = p.ID
JOIN files2 AS f2 ON f1.relative_path = f2.relative_path
JOIN paths2 AS p2 ON f2.path_prefix_index = p2.ID
WHERE f1.sha <> f2.sha
/*GROUP BY f1.relative_path*/;
DROP TABLE IF EXISTS files;
DROP TABLE IF EXISTS paths;
DROP TABLE IF EXISTS files2;
DROP TABLE IF EXISTS paths2;