Skip to content

DOC-759 | Reworked GraphML documentation #722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion site/content/3.13/aql/functions/vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To use vector search, you need to have vector embeddings stored in documents
and the attribute that stores them needs to be indexed by a
[vector index](../../index-and-search/indexing/working-with-indexes/vector-indexes.md).

You can calculate vector embeddings using [ArangoDB's GraphML](../../data-science/arangographml/_index.md)
You can calculate vector embeddings using [ArangoDB's GraphML](../../data-science/graphml/_index.md)
capabilities (available in ArangoGraph) or using external tools.

{{< warning >}}
Expand Down
6 changes: 3 additions & 3 deletions site/content/3.13/data-science/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Data Science
menuTitle: Data Science
title: Data Science and GenAI
menuTitle: Data Science & GenAI
weight: 115
description: >-
ArangoDB lets you apply analytics and machine learning to graph data at scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "analytics" here refers to graph analytics if I'm not mistaken, and that is moved out of the Data Science chapter (#728), so needs to be changed. There's also no mention of GenAI (Suite) here.

The first paragraph covers graph analytics and graph ML, but the first one will no longer belong here with the above PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire page will be reworked, also based on the #696 PR

Expand Down Expand Up @@ -69,7 +69,7 @@ GraphML can answer questions like:
![Graph ML](../../images/graph-ml.png)

For ArangoDB's enterprise-ready, graph-powered machine learning offering,
see [ArangoGraphML](arangographml/_index.md).
see [ArangoGraphML](graphml/_index.md).

## Use Cases

Expand Down
76 changes: 0 additions & 76 deletions site/content/3.13/data-science/arangographml/deploy.md

This file was deleted.

264 changes: 0 additions & 264 deletions site/content/3.13/data-science/arangographml/ui.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
---
title: ArangoGraphML
menuTitle: ArangoGraphML
title: ArangoDB GraphML
menuTitle: GraphML
weight: 125
description: >-
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed
Boost your machine learning models with graph data using ArangoDB's advanced GraphML capabilities
aliases:
- graphml
- arangographml
---
Traditional Machine Learning (ML) overlooks the connections and relationships
between data points, which is where graph machine learning excels. However,
accessibility to GraphML has been limited to sizable enterprises equipped with
specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
specialized teams of data scientists. ArangoDB simplifies the utilization of Graph Machine Learning,
enabling a broader range of personas to extract profound insights from their data.

With ArangoDB, you can solve high-computational graph problems using Graph Machine
Learning. Apply it on a selected graph to predict connections, get better product
recommendations, classify nodes, and perform node embeddings. You can configure and run
the whole machine learning flow entirely through the web interface or programmatically.

## How GraphML works

Graph machine learning leverages the inherent structure of graph data, where
Expand All @@ -21,35 +26,48 @@ traditional ML, which primarily operates on tabular data, GraphML applies
specialized algorithms like Graph Neural Networks (GNNs), node embeddings, and
link prediction to uncover complex patterns and insights.

The underlying framework for ArangoDB's GraphML is **[GraphSAGE](https://snap.stanford.edu/graphsage/)**.
GraphSAGE (Graph Sample and AggreGatE) is a powerful Graph Neural Network (GNN)
**framework** designed for inductive representation learning on large graphs.
It is used to generate low-dimensional vector representations for nodes and is
especially useful for graphs that have rich node attribute information.
The overall process involves the following steps:

1. **Graph Construction**:
Raw data is transformed into a graph structure, defining nodes and edges based
- Raw data is transformed into a graph structure, defining nodes and edges based
on real-world relationships.
2. **Featurization**:
Nodes and edges are enriched with features that help in training predictive models.
3. **Model Training**:
Machine learning techniques are applied on GNNs to identify patterns and make predictions.
2. **Featurization**: Your raw graph data is transformed into numerical representations that the model can understand.
- The system iterates over your selected vertices and converts their attributes: booleans become `0` or `1`, numbers are normalized, and text attributes are converted into numerical vectors using sentence transformers.
- All of these numerical features are then combined (concatenated).
- Finally, **Incremental PCA** (Incremental Principal Component Analysis a dimensionality reduction technique) is used to reduce the size of the combined features, which helps remove noise and keep only the most important information.
3. **Training**: The model learns from the graph's structure by sampling and aggregating information from each node's local neighborhood.
- For each node, GraphSAGE looks at connections up to **2 hops away**.
- Specifically, it uniformly samples up to **25 direct neighbors** (depth 1) and for each of those, it samples up to **10 of their neighbors** (depth 2).
- By aggregating feature information from this sampled neighborhood, the model creates a rich "embedding" for each node that captures both its own features and its role in the graph.
4. **Inference & Insights**:
The trained model is used to classify nodes, detect anomalies, recommend items,
- The trained model is used to classify nodes, detect anomalies, recommend items,
or predict future connections.

ArangoGraphML streamlines these steps, providing an intuitive and scalable
ArangoDB streamlines these steps, providing an intuitive and scalable
framework to integrate GraphML into various applications, from fraud detection
to recommendation systems.

![GraphML Embeddings](../../../images/GraphML-Embeddings.webp)

![GraphML Workflow](../../../images/GraphML-How-it-works.webp)

It is no longer necessary to understand the complexities involved with graph
machine learning, thanks to the accessibility of the ArangoML package.
Solutions with ArangoGraphML only require input from a user about
their data, and the ArangoGraphML managed service handles the rest.
You no longer need to understand the complexities of graph machine learning to
benefit from it. Solutions with ArangoDB's GraphML only require input from a user about
their data, and the GraphML managed service handles the rest.

The platform comes preloaded with all the tools needed to prepare your graph
for machine learning, high-accuracy training, and persisting predictions back
to the database for application use.

## Supported Tasks
## What you can do with GraphML

GraphML directly supports two primary machine learning tasks:
**Node Classification** and **Node Embeddings**.

### Node Classification

Expand All @@ -58,7 +76,7 @@ predict the label of a node based on both its own features and its relationships
within the graph. It requires a set of labeled nodes to train a model, which then
classifies unlabeled nodes based on learned patterns.

**How it works in ArangoGraphML**
**How it works in ArangoDB**

- A portion of the nodes in a graph is labeled for training.
- The model learns patterns from both **node features** and
Expand Down Expand Up @@ -97,7 +115,7 @@ into numerical vector representations, preserving their **structural relationshi
within the graph. Unlike simple feature aggregation, node embeddings
**capture the influence of neighboring nodes and graph topology**, making
them powerful for downstream tasks like clustering, anomaly detection,
and link prediction. These combinations can provide valuable insights.
and link prediction. This combination provides valuable insights.
Consider using [ArangoDB's Vector Search](https://arangodb.com/2024/11/vector-search-in-arangodb-practical-insights-and-hands-on-examples/)
capabilities to find similar nodes based on their embeddings.

Expand All @@ -116,7 +134,7 @@ Essentially, they aggregate both the node's attributes and the connectivity patt
within the graph. This fusion helps capture not only the individual properties of
a node but also its position and role within the network.

**How it works in ArangoGraphML**
**How it works in ArangoDB**

- The model learns an embedding (a vector representation) for each node based on its
**position within the graph and its connections**.
Expand Down Expand Up @@ -161,21 +179,21 @@ a node but also its position and role within the network.
| **Key Advantage** | Learns labels based on node connections and attributes | Learns structural patterns and node relationships |
| **Use Cases** | Fraud detection, customer segmentation, disease classification | Recommendations, anomaly detection, link prediction |

ArangoGraphML provides the infrastructure to efficiently train and apply these
GraphML provides the infrastructure to efficiently train and apply these
models, helping users extract meaningful insights from complex graph data.

## Metrics and Compliance

ArangoGraphML supports tracking your ML pipeline by storing all relevant metadata
GraphML supports tracking your ML pipeline by storing all relevant metadata
and metrics in a Graph called ArangoPipe. This is only available to you and is never
viewable by ArangoDB. This metadata graph links all experiments
to the source data, feature generation activities, training runs, and prediction
jobs, allowing you to track the entire ML pipeline without having to leave ArangoDB.

### Security
## Security

Each deployment that uses ArangoGraphML has an `arangopipe` database created,
Each deployment that uses GraphML has an `arangopipe` database created,
which houses all ML Metadata information. Since this data lives within the deployment,
it benefits from the ArangoGraph security features and SOC 2 compliance.
All ArangoGraphML services live alongside the ArangoGraph deployment and are only
All GraphML services live alongside the ArangoGraph deployment and are only
accessible within that organization.
Loading