如何在 apache ignite 2.x 中使用 sql 查询组合亲和性

Question

我正在尝试使用 Apache Ignite 实现 sql 查询的亲和力。假设我有以下类（代码可在 https://github.com/hostettler/distibuted-queries）

//I do not put here all the configuration, but it is available in my small test if interested
class EntityAKey {
   @AffinityKeyMapped
   String id;
}

class EntityBKey {
   String id;
   @AffinityKeyMapped
   String entityAId;
}

...
String joinSql = "select count(1) from \"EntityA\".EntityA as a inner join \"EntityB\".EntityB as b on a.id = b.entityA ";
SqlFieldsQuery q1 = new SqlFieldsQuery(joinSql);
q1.setDistributedJoins(false);
FieldsQueryCursor<List<?>> c = cacheA.query(q1);
List<List<?>> l = c.getAll();

在这种情况下（A <--- B), everything is fine I can even instruct the sql engine to work per group of partitions and I can set setDistributedJoins(false) to instruct the engine that it does not have to look at other partitions.

现在，只要查询的所有联接都依赖于完全相同的亲和力，这就可以工作。因此，假设我添加类 C 和 D，例如 (C<---D) and A has a relationship to C such as

class EntityAKey {
   @AffinityKeyMapped
   String id;
}
class EntityA {
   String id;
   String entityC; //Id of entityC but THIS IS NOT PART OF THE KEY 
}

class EntityBKey {
   String id;
   @AffinityKeyMapped
   String entityAId;
}

class EntityCKey {
   @AffinityKeyMapped
   String id;
}

class EntityDKey {
   String id;
   @AffinityKeyMapped
   String entityCId;
}

String joinSql = "select count(1) from \"EntityA\".EntityA as a " 
+ " inner join \"EntityB\".EntityB as b on a.id = b.entityA" 
+ " inner join \"EntityC\".EntityC as c on a.entityC = c.id"
+ " inner join \"EntityD\".EntityD as d on d.entityC = a.entityC";
SqlFieldsQuery q1 = new SqlFieldsQuery(joinSql);
q1.setDistributedJoins(false); //only works if setDistributedJoins(true)
q1.setPageSize(100);
FieldsQueryCursor<List<?>> c = cacheA.query(q1);
List<List<?>> l = c.getAll();

在这种情况下，结果不正确（如文档中所述），只是因为我们有两种不同的亲和力——在 A(id) 和 C(id) 上——因此 A 和 B 之间的连接工作正常，C 和 C 之间的连接也相同D 但显然不是整个查询。

针对实际问题的所有冗长介绍：

我可以想出以下解决方案，但我不喜欢其中任何一个。我想听听社区对他们的建议：

a) set setDistributedJoins(true) 工作得很好，但我的理解是，在这种情况下，查询不是在多个线程（节点内）上完成的，因为它不能依赖亲和力来修剪分区。这样的理解正确吗？

b) 使用 setDistributedJoins(false) 将查询拆分为两个不同的查询，其中一个查询是 A->B 和 C->D，然后使用 setDistributed(true) 将这两个查询与第三个查询进行协调。这对于现实生活中具有 10-15 个连接（可能有 3-4 种不同类型的关联性）的查询来说是不切实际的

c) 更改模型，通过在实体 C 中人为地设置实体 A id 来使各处都具有亲和力。这是不切实际的，因为这意味着cache.get将变得更加麻烦，因为我不知道entityA id

您是否同意 Apache Ignite 优化器应该为我进行修剪并应尽可能应用亲和力？
采用方解石可以解决这个问题吗？

非常感谢您的帮助。

Answer 1

使用分布式连接只是症状，而不是原因。根本原因是您的数据不是位于同一位置，因此需要通过网络对数据进行混洗。对于内存数据库来说，必须连接网络是一件坏事
确实如此，但如果没有的话，它就无法“应用亲和力”。实体 A 和实体 C 之间的连接不是共置
不，因为这与数据的位置有关。 Ignite 中的方解石引擎仍处于测试阶段，因此我不确定您能否对性能得出任何结论。未来应该会有更好的表现

简而言之，您可以：

将您的数据放在同一位置
使用分布式联接
重写您的查询。可以使用子选择或 CTE 来避免分布式连接的“洗牌”阶段

如何在 apache ignite 2.x 中使用 sql 查询组合亲和性

问题描述投票：0回答：1

1个回答

最新问题

如何在 apache ignite 2.x 中使用 sql 查询组合亲和性

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1