-
Notifications
You must be signed in to change notification settings - Fork 8
DOC-759 | Reworked GraphML documentation #722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy Preview Available Via |
Co-authored-by: Simran <Simran-B@users.noreply.github.com>
title: Data Science | ||
menuTitle: Data Science | ||
title: Data Science and GenAI | ||
menuTitle: Data Science & GenAI | ||
weight: 115 | ||
description: >- | ||
ArangoDB lets you apply analytics and machine learning to graph data at scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "analytics" here refers to graph analytics if I'm not mistaken, and that is moved out of the Data Science chapter (#728), so needs to be changed. There's also no mention of GenAI (Suite) here.
The first paragraph covers graph analytics and graph ML, but the first one will no longer belong here with the above PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire page will be reworked, also based on the #696 PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno whether we should follow the title and use web-interface.md instead? Right now, that term is occupied by the current arangod UI, though. So probably fine.
Learn how to create, configure, and run a full machine learning workflow for | ||
GraphML using the steps and features in the ArangoDB web interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you want to go with "using the steps and features" but it might be easier to understand to just say something like "in four steps"
@@ -0,0 +1,244 @@ | |||
--- | |||
title: How to use GraphML in the Web Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more explicit like this to distinguish it from the core UI? How will the two actually co-exist? Will some code be shared or do we keep two separate implementations?
title: How to use GraphML in the Web Interface | |
title: How to use GraphML in the ArangoDB Platform web interface |
- **Batch size** – The number of documents to process in a single batch. | ||
- **Run analysis checks** – Whether to run analysis checks to perform a high-level | ||
analysis of the data quality before proceeding. The default value is `true`. | ||
- **Skip labels** – Skip the featurization process for attributes marked as labels. | ||
The default value is `false`. | ||
- **Overwrite FS graph** – Whether to overwrite the Feature Store graph if features | ||
were previously generated. The default value is `false`, therefore features are | ||
written to an existing Feature Store graph. | ||
- **Write to source graph** – Whether to store the generated features on the Source | ||
Graph. The default value is `true`. | ||
- **Use feature store** – Enable the use of the Feature Store database, which | ||
allows you to store features separately from your Source Database. The default | ||
value is `false`, therefore features are written to the source graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, –
should be substituted with :
|
||
- **Featurize new documents:** Enable this option to generate features for | ||
documents that have been added since the model was trained. This is useful | ||
for getting predictions on new data without having to retrain the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could briefly mention model drift / decrease in quality over time here
### Enable scheduling | ||
|
||
You can configure automatic predictions using the **Enable scheduling** checkbox. | ||
When scheduling is turned on, predictions run automatically based on a set CRON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When scheduling is turned on, predictions run automatically based on a set CRON | |
When scheduling is enabled, predictions run automatically based on a set CRON |
When scheduling is turned on, predictions run automatically based on a set CRON | ||
expression. This helps keep prediction results up-to-date as new data is added to the system. | ||
|
||
You can define a cron expression that sets when the prediction job should run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the UI assists you with the CRON syntax, should we describe the more user-friendly way first? (And ideally swap it in the UI...)
|
||
## Limitations | ||
|
||
- **Edge Attributes**: The current version of GraphML does not support the use of edge attributes as features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely a permanent limitation unfortunately because GraphSAGE doesn't support it
Description
To be discussed what content can we add to the new Quickstart page.