From 20359959ca400e16634f894bcca68d22bc97c89b Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Tue, 1 Jul 2025 17:14:08 +0200 Subject: [PATCH] AQL optimization: COLLECT ... AGGREGATE can utilize persistent index --- .../version-3.12/whats-new-in-3-12.md | 34 +++++++++++++++++++ .../version-3.12/whats-new-in-3-12.md | 34 +++++++++++++++++++ 2 files changed, 68 insertions(+) diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 77b8c26bb2..32c8e4fe42 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -1257,6 +1257,40 @@ to some extent. See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex) for details. +--- + +Introduced in: v3.12.5 + +The `use-index-for-collect` optimizer rule has been further extended. +Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively +refers to attributes covered by a persistent index and no other variables can +now utilize this index. + +Reading the data from the index instead of the stored documents for aggregations +can significantly increase the perform if the there are few different values. + +```aql +FOR doc IN coll + COLLECT a = doc.a AGGREGATE b = MAX(doc.b) + RETURN { a, b } +``` + +If there is a persistent index over the attributes `a` and `b`, then the query +explain output shows an `IndexCollectNode` if the optimization is applied: + +```aql +Execution plan: + Id NodeType Par Est. Comment + 1 SingletonNode 1 * ROOT + 10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */ + 6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */ + 7 ReturnNode 4999 - RETURN #5 + +Indexes used: + By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges + 10 idx_1836452431376941056 persistent coll +``` + ## Indexing ### Multi-dimensional indexes diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md index 77b8c26bb2..32c8e4fe42 100644 --- a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md @@ -1257,6 +1257,40 @@ to some extent. See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex) for details. +--- + +Introduced in: v3.12.5 + +The `use-index-for-collect` optimizer rule has been further extended. +Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively +refers to attributes covered by a persistent index and no other variables can +now utilize this index. + +Reading the data from the index instead of the stored documents for aggregations +can significantly increase the perform if the there are few different values. + +```aql +FOR doc IN coll + COLLECT a = doc.a AGGREGATE b = MAX(doc.b) + RETURN { a, b } +``` + +If there is a persistent index over the attributes `a` and `b`, then the query +explain output shows an `IndexCollectNode` if the optimization is applied: + +```aql +Execution plan: + Id NodeType Par Est. Comment + 1 SingletonNode 1 * ROOT + 10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */ + 6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */ + 7 ReturnNode 4999 - RETURN #5 + +Indexes used: + By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges + 10 idx_1836452431376941056 persistent coll +``` + ## Indexing ### Multi-dimensional indexes