Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Gremlin Connector Timeout when Fetching Vertex Schema upon Database Synchronizing #225

Closed
dsaban-lightricks opened this issue Dec 24, 2023 · 3 comments · Fixed by #498
Labels
bug Something isn't working database support Issues related to adding or changing the databases servers or languages supported discussion Support/further information is requested reliability Issues relating to improvements in reliability
Milestone

Comments

@dsaban-lightricks
Copy link

Community Note

  • Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Describe the bug
When synchronizing Graph Explorer UI using the Gremlin Connector, the synchronization fails due to a timeout.
The timeout occurs when executing the code that fetches vertices attributes (fetchVerticesAttributes) - it makes an HTTP call to the graph-explorer's proxy URL.

  • Deployment of Graph Explorer: via SageMaker
  • Browser: Google Chrome
  • Graph Explorer Version: 1.4.0
  • Graph Database & Version: Amazon Neptune 1.3.0.0
  • Graph Connector GP-Gremlin as default connection

To Reproduce
Steps to reproduce the behavior:

  1. Go to Amazon Neptune Console
  2. Click on Notebooks (https://us-west-2.console.aws.amazon.com/neptune/home?region=us-west-2#notebooks:)
  3. Select radio button for relevant notebook
  4. Click on Actions button
  5. Select Open Graph Explorer
  6. Then, in graph-explorer UI, click on GP-Gremlin default connection
  7. Click on Synchronize Database icon (top-right of UI view)

See screen capture of the graph-explorer UI, along with log showing network activity upon step 7 above. Note the timeout after 2 minutes for the HTTP request.

Screenshot-graph-explorer-chrome

Expected behavior
The expected behavior upon clicking the Synchronize Database icon would be to receive a success notification after a few seconds (i.e. something less than 10 seconds).

Some Additional Notes
The gremlin query being executed via the above mentioned proxy HTTP call is generated by the verticesSchemaTemplate function.

In our case, the query produced by verticesSchemaTemplate function is as follows (the label names have been changed in this example):

g.V().project("VertexType1","VertexType2","VertexType3","VertexType4","VertexType5","VertexType6","VertexType7","VertexType8").by(V().hasLabel("VertexType1").limit(1)).by(V().hasLabel("VertexType2").limit(1)).by(V().hasLabel("VertexType3").limit(1)).by(V().hasLabel("VertexType4").limit(1)).by(V().hasLabel("VertexType5").limit(1)).by(V().hasLabel("VertexType6").limit(1)).by(V().hasLabel("VertexType7").limit(1)).by(V().hasLabel("VertexType8").limit(1)).limit(1)

Upon further investigation, it was found that this query works for databases smaller than the one we currently have deployed.
As a mitigation to the size of the database we ran the query with an extended timeout and it completed with success, but it took over 8 minutes to complete. The default timeout is 2 minutes, and hence the Sychronization fails in the Graph Explorer UI.

A proposed query (that should return an equivalent result), completes successfully in under 1 second for our graph database:

g.V().union(
    __.hasLabel('VertexType1').limit(1),
    __.hasLabel('VertexType2').limit(1),
    __.hasLabel('VertexType3').limit(1),
    __.hasLabel('VertexType4').limit(1),
    __.hasLabel('VertexType5').limit(1),
    __.hasLabel('VertexType6').limit(1),
    __.hasLabel('VertexType7').limit(1),
    __.hasLabel('VertexType8').limit(1)
)
.fold()
.project('VertexType1', 'VertexType2', 'VertexType3', 'VertexType4', 'VertexType5', 'VertexType6', 'VertexType7', 'VertexType8')
.by(unfold().hasLabel('VertexType1'))
.by(unfold().hasLabel('VertexType2'))
.by(unfold().hasLabel('VertexType3'))
.by(unfold().hasLabel('VertexType4'))
.by(unfold().hasLabel('VertexType5'))
.by(unfold().hasLabel('VertexType6'))
.by(unfold().hasLabel('VertexType7'))
.by(unfold().hasLabel('VertexType8'))

Explanation of the proposed query above:

  1. union the results for each vertex label (each result from an anonymous query with a limit of 1)
  2. fold the results into a single value
  3. project each label
  4. provide the projection with the label's value.

See attached files for more details about our graph and the execution of the incumbent query and proposed query:

Some Cluster Status Info (see details cluster_status.json):

  1. DB Engine Version: 1.3.0.0.R1
  2. Gremlin Version: tinkerpop-3.6.4

Graph Summary (see details graph_summary.json):

  • Nodes:
  1. Number of nodes: 584713969
  2. Number of node labels: 8
  3. Number of node properties: 18
  • Edges:
  1. Number of edges: 762486650
  2. Number of edge labels: 8
  3. Number of edge properties: 4

Graph Statistics (see details graph_statistics.json)

  • Signature Count: 94
  • Instance Count: 1347200742
  • Predicate Count: 31

The Explain and Profile of the currently used query (verticesSchemaTemplate)
ge-query-explain.txt
ge-query-profile.txt

The Explain and Profile of the proposed query, above:
ge-query-modified-explain.txt
ge-query-modified-profile.txt

@dsaban-lightricks dsaban-lightricks added the bug Something isn't working label Dec 24, 2023
@dsaban-lightricks
Copy link
Author

@dsaban-lightricks
Copy link
Author

dsaban-lightricks added a commit to dsaban-lightricks/graph-explorer that referenced this issue Dec 26, 2023
…se Synchronization

Issue:

* [[Bug] Gremlin Connector Timeout when Fetching Vertex Schema upon Database Synchronizing aws#225](aws#225)

## Description of changes:

* Updated vertices schema template for the Gremlin Connector. This code change addresses the timeout issue of Gremlin
  Connector fetching vertex schema upon database synchronization.
* Updated unit test that checks the generation of the schema template for the Gremlin Connector.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@dsaban-lightricks
Copy link
Author

Please consider PR to fix issue: #226

@xiazcy xiazcy added the discussion Support/further information is requested label Apr 18, 2024
@kmcginnes kmcginnes added reliability Issues relating to improvements in reliability database support Issues related to adding or changing the databases servers or languages supported labels Apr 26, 2024
@kmcginnes kmcginnes added this to the Release 1.9.0 milestone Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database support Issues related to adding or changing the databases servers or languages supported discussion Support/further information is requested reliability Issues relating to improvements in reliability
Projects
No open projects
Status: 🔖 Ready
Development

Successfully merging a pull request may close this issue.

3 participants