Amazon Neptune 中的顶点重复

Question

我想创建一些逻辑，使用 Gremlin 在 Amazon Neptune 中执行以下操作：

1. 加载一行数据，其中包含 customer_id 和 postcode 列

2. 检查数据库中是否已经存在该行的 postcode 值：

A. 如果是，则为该行的 customer_id 值创建一个新顶点，然后创建一个新边，该边从刚刚创建的customer_id顶点到预现有邮编顶点

B. 否则，如果没有，则为该行的 customer_id 值创建一个新顶点，为该行的 postcode 值创建一个新顶点，然后创建一个新边，使连接from customer_id 刚刚创建的顶点 to postcode 刚刚创建的顶点

这样做的目的是避免创建重复的顶点。
如果你能看出我逻辑中的缺陷，我愿意接受不同的方法。
我尝试了几种方法，但我一直无法获得执行上述所有操作的单一逻辑。
我正在使用 Gremlin.

Answer 1

首先，如果要保证唯一性，Neptune 中一个图中的每个顶点和边都必须有唯一的 ID。因此，最好的做法是充分利用该概念。确定性 ID 也非常适合快速查找，因为通过顶点/边 ID 进行查找是 Neptune 中最快的操作。如果您不提供顶点/边 ID 的值，Neptune 会使用 UUID 创建一个 ID。

之后你会想要考虑使用条件写入模式。在 Gremlin 中，您可以遵循 Practical Gremlin [1] 中记录的模式。

因此，对于您的用例，模式将遵循以下内容：

g.V().hasLabel('customer').has('customer_id',<id>).
    fold().coalesce(
        unfold(),
        addV('customer').property('customer_id',<id>)
    ).aggregate('c').
    V().hasLabel('postcode').has('postcode',<postcode>).
        fold().coalesce(
            unfold(),
            addV('postcode').property('postcode',<postcode>)
        ).
    addE('hasPostCode').from(select('c').unfold())

注意：上面使用
aggregate()
步骤是因为我们想在查询中标记某些内容，但随后我们需要在查询中稍后跨越折叠障碍步骤 (
fold()
)。如果我们要使用
as()
，标签将不会持续超出折叠屏障步骤。

如果使用确定性 ID，这可以简化。假设我们对客户顶点使用“customer-id”的 ID 命名法，对邮政编码顶点使用“postcode-code”：

g.V(<customer_id>).
    fold().coalesce(
        unfold(),
        addV('customer').property(id,<customer_id>)
    ).
    V(<postcode_id>).
        fold().coalesce(
            unfold(),
            addV('postcode').property(id,<postcode_id>)
        ).
    addE('hasPostCode').from(V(<customer_id>)

[1] https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#upsert

Amazon Neptune 中的顶点重复

问题描述投票：0回答：1

1个回答

最新问题

Amazon Neptune 中的顶点重复

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1