Skip to content

DOC-759 | Reworked GraphML documentation #722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 3, 2025
Merged

DOC-759 | Reworked GraphML documentation #722

merged 9 commits into from
Jul 3, 2025

Conversation

nerpaula
Copy link
Contributor

Description

  • Incorporated all conceptual information into the GraphML index page
  • Reworked the Web Interface page (upcoming UI changes to be treated in separate PRs)
  • Removed "ArangoGraphML" term
  • Removed the instructions for self-managed GraphML using Notebooks or API
  • Renamed files

To be discussed what content can we add to the new Quickstart page.

Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-722--docs-hugo.netlify.app

@cla-bot cla-bot bot added the cla-signed label Jun 25, 2025
@nerpaula nerpaula self-assigned this Jun 25, 2025
nerpaula and others added 2 commits June 26, 2025 09:46
Co-authored-by: Simran <Simran-B@users.noreply.github.com>
@nerpaula nerpaula requested a review from Simran-B June 27, 2025 14:35
title: Data Science
menuTitle: Data Science
title: Data Science and GenAI
menuTitle: Data Science & GenAI
weight: 115
description: >-
ArangoDB lets you apply analytics and machine learning to graph data at scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "analytics" here refers to graph analytics if I'm not mistaken, and that is moved out of the Data Science chapter (#728), so needs to be changed. There's also no mention of GenAI (Suite) here.

The first paragraph covers graph analytics and graph ML, but the first one will no longer belong here with the above PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire page will be reworked, also based on the #696 PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno whether we should follow the title and use web-interface.md instead? Right now, that term is occupied by the current arangod UI, though. So probably fine.

Comment on lines 6 to 7
Learn how to create, configure, and run a full machine learning workflow for
GraphML using the steps and features in the ArangoDB web interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you want to go with "using the steps and features" but it might be easier to understand to just say something like "in four steps"

@@ -0,0 +1,244 @@
---
title: How to use GraphML in the Web Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more explicit like this to distinguish it from the core UI? How will the two actually co-exist? Will some code be shared or do we keep two separate implementations?

Suggested change
title: How to use GraphML in the Web Interface
title: How to use GraphML in the ArangoDB Platform web interface

Comment on lines 58 to 70
- **Batch size** – The number of documents to process in a single batch.
- **Run analysis checks** – Whether to run analysis checks to perform a high-level
analysis of the data quality before proceeding. The default value is `true`.
- **Skip labels** – Skip the featurization process for attributes marked as labels.
The default value is `false`.
- **Overwrite FS graph** – Whether to overwrite the Feature Store graph if features
were previously generated. The default value is `false`, therefore features are
written to an existing Feature Store graph.
- **Write to source graph** – Whether to store the generated features on the Source
Graph. The default value is `true`.
- **Use feature store** – Enable the use of the Feature Store database, which
allows you to store features separately from your Source Database. The default
value is `false`, therefore features are written to the source graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should be substituted with :


- **Featurize new documents:** Enable this option to generate features for
documents that have been added since the model was trained. This is useful
for getting predictions on new data without having to retrain the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could briefly mention model drift / decrease in quality over time here

### Enable scheduling

You can configure automatic predictions using the **Enable scheduling** checkbox.
When scheduling is turned on, predictions run automatically based on a set CRON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When scheduling is turned on, predictions run automatically based on a set CRON
When scheduling is enabled, predictions run automatically based on a set CRON

When scheduling is turned on, predictions run automatically based on a set CRON
expression. This helps keep prediction results up-to-date as new data is added to the system.

You can define a cron expression that sets when the prediction job should run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the UI assists you with the CRON syntax, should we describe the more user-friendly way first? (And ideally swap it in the UI...)


## Limitations

- **Edge Attributes**: The current version of GraphML does not support the use of edge attributes as features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely a permanent limitation unfortunately because GraphSAGE doesn't support it

@nerpaula nerpaula requested a review from Simran-B July 3, 2025 10:03
@nerpaula nerpaula merged commit cd7b519 into main Jul 3, 2025
5 checks passed
@nerpaula nerpaula deleted the DOC-759 branch July 3, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants