使用重复和次数时优化 neptune 查询

问题描述 投票:0回答:1

我有以下海王星查询

g.V().hasLabel('User')
  .has('user_id', 1004)
.repeat(both('USES_UPI','USES_ACCOUNT','USES_HARDWARE_ID','USES_GAID','HAS_COOKIES').simplePath().dedup())
  .times(3)
  .hasLabel('Gaid')
  .dedup()
  .count()

而且花费太多时间。

我尝试分析查询

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([~label.eq(User), user_id.eq(159017810)]), RepeatStep([VertexStep(BOTH,[USES_UPI, USES_ACCOUNT, USES_HARDWARE_ID, USES_GAID, HAS_COOKIES],vertex), PathFilterStep(simple,null,null), DedupGlobalStep(null,null), RepeatEndStep],until(loops(3)),emit(false)), HasStep([~label.eq(Gaid)]), DedupGlobalStep(null,null), CountGlobalStep]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <user_id>, ?9, ?) . project distinct ?1 . ContainsFilter(?9 in (159017810^^<INT>, 159017810^^<LONG>, 1.59017808E8^^<FLOAT>, 1.5901781E8^^<DOUBLE>)) .], {estimatedCardinality=1, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
            PatternNode[(?1, <~label>, ?2=<User>, <~>) . project ask .], {estimatedCardinality=1327714, expectedTotalOutput=679, actualTotalOutput=379683, indexTime=0, joinTime=0, numSearches=1}
            RepeatNode {
                Repeat {
                    JoinGroupNode {
                        UnionNode {
                            PatternNode[(?3, ?6, ?4, ?7) . project ?3,?4 . IsEdgeIdFilter(?7) . ContainsFilter(?6 in (<USES_UPI>, <USES_ACCOUNT>, <USES_HARDWARE_ID>, <USES_GAID>, <HAS_COOKIES>)) .], {cacheJoin=true, estimatedCardinality=1923966, indexTime=354, joinTime=26212, numSearches=379683}
                            PatternNode[(?4, ?6, ?3, ?7) . project ?3,?4 . IsEdgeIdFilter(?7) . ContainsFilter(?6 in (<USES_UPI>, <USES_ACCOUNT>, <USES_HARDWARE_ID>, <USES_GAID>, <HAS_COOKIES>)) .], {cacheJoin=true, estimatedCardinality=1923966, indexTime=356, joinTime=19633, numSearches=379683}
                        }, annotations={estimatedCardinality=3847932}
                        SimplePathFilter(?1, ?4)) .
                    }
                }
                LoopsCondition {
                    LoopsFilter(?3,eq(3))
                }
            }, annotations={emitFirst=false, untilFirst=false, repeatMode=BFS, dedup=true}
        }, annotations={path=[Vertex(?1):GraphStep, Repeat[̶V̶e̶r̶t̶e̶x̶(̶?̶3̶)̶:̶G̶r̶a̶p̶h̶S̶t̶e̶p̶, Vertex(?4):VertexStep, ̶V̶e̶r̶t̶e̶x̶(̶?̶8̶)̶:̶V̶e̶r̶t̶e̶x̶S̶t̶e̶p̶]], joinStats=true, optimizationTime=2, maxVarId=10, executionTime=107826}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: NeptuneHasStep([~label.eq(Gaid)]),
Neptune steps:
[
    NeptuneMemoryTrackerStep
]
+ not converted into Neptune steps: DedupGlobalStep(null,null),CountGlobalStep,

WARNING: >> [NeptuneHasStep([~label.eq(Gaid)]), DedupGlobalStep(null,null)] << (or one of the children for each step) is not supported natively yet



Runtime (ms)
============
Query Execution: 107828.341

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                     743022      743022       52892.767    49.06
NeptuneTraverserConverterStep                                     743022      743022        8142.297     7.55
NeptuneHasStep([~label.eq(Gaid)])                                   4138        4138       46695.267    43.32
DedupGlobalStep(null,null)                                          4138        4138          48.927     0.05
CountGlobalStep                                                        1           1          22.215     0.02
                                            >TOTAL                     -           -      107801.475        -

Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        1        0        0        0        1
        1        3        0        0        0        3
        2   379679        0        0        0   379679
        3   743022   743022   743022        0        0
------------------------------------------------------
           1122705   743022   743022        0   379683

Predicates
==========
# of predicates: 26

Results
=======
Count: 1


Index Operations
================
Query execution:
    # of statement index ops: 1,502,390
    # of unique statement index ops: 1,502,390
    Duplication ratio: 1.0
    # of terms materialized: 162

现在需要几秒钟的时间,可以以某种方式优化吗?

java amazon-web-services graph gremlin amazon-neptune
1个回答
0
投票

所有用户都有唯一的 ID 吗?如果是这样,您可能可以通过将用户 ID 存储为实际顶点 ID 来加速查询的第一部分,而不必费心进行标签检查和查找属性值。您使用数字作为 ID 的事实也会导致延迟增加,因为 Neptune 正在经历应用类型提升作为查询执行的一部分的过程。在本例中,只需使用

user_1004
之类的内容作为用户顶点的 ID。然后你可以这样做:

g.V('user_1004').
    repeat(
        both(
            'USES_UPI',
            'USES_ACCOUNT',
            'USES_HARDWARE_ID',
            'USES_GAID',
            'HAS_COOKIES').
        simplePath().
        dedup()).
  times(3).
  hasLabel('Gaid').
  dedup().
  count()

在第二跳之后,图中还出现了大量的扇出。如果图中有超级节点,您可能需要派生一种方法来检测这些超级节点并缓解它们。在大多数 ID 图模式中,超级节点提供的价值很少。因此,在节点上放置递增计数器属性来表示其扇出可以更轻松地避免遍历超级节点。然后在

repeat()
中,您可以添加类似
has('associated_ids', lt(10))
的内容来过滤掉具有更多您关心的下游连接的任何节点。

© www.soinside.com 2019 - 2024. All rights reserved.