Skip to content

Commit

Permalink
Documented noisy aggregate functions in new page
Browse files Browse the repository at this point in the history
Moved documentation for noisy_count_gaussian and related
functions from the Aggregate Functions page to a new Noisy
Aggregate Functions page. Also added documentation of the
SFM sketch type and functions, such as noisy_approx_set_sfm.
  • Loading branch information
jonhehir committed May 15, 2024
1 parent 6c682f5 commit d8376e1
Show file tree
Hide file tree
Showing 4 changed files with 233 additions and 171 deletions.
1 change: 1 addition & 0 deletions presto-docs/src/main/sphinx/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Functions and Operators
functions/json
functions/datetime
functions/aggregate
functions/noisy
functions/window
functions/array
functions/map
Expand Down
185 changes: 14 additions & 171 deletions presto-docs/src/main/sphinx/functions/aggregate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@
Aggregate Functions
===================

.. contents::
:local:
:backlinks: none
:depth: 1

Overview
--------

Aggregate functions operate on a set of values to compute a single result.

Except for :func:`count`, :func:`count_if`, :func:`max_by`, :func:`min_by` and
Expand Down Expand Up @@ -370,176 +378,6 @@ Approximate Aggregate Functions
:func:`numeric_histogram` that takes a ``weight``, with a per-item weight of ``1``.
In this case, the total weight in the returned map is the count of items in the bin.

.. function:: noisy_count_gaussian(x, noise_scale) -> bigint

Counts the non-null values and then adds a random Gaussian noise
with 0 mean and standard deviation of ``noise_scale`` to the true count.
The noisy count is post-processed to be non-negative and rounded to bigint.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_count_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_count_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_count_gaussian(x, noise_scale, random_seed) -> bigint

Counts the non-null values and then adds a random Gaussian noise
with 0 mean and standard deviation of ``noise_scale`` to the true count.
The noisy count is post-processed to be non-negative and rounded to bigint.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_count_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_count_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_count_if_gaussian(x, noise_scale) -> bigint

Counts the `TRUE` values and then adds a random Gaussian noise
with 0 mean and standard deviation of ``noise_scale`` to the true count.
The noisy count is post-processed to be non-negative and rounded to bigint.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_count_if_gaussian(orderkey > 10, 20.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_count_if_gaussian(orderkey > 10, 20.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_count_if_gaussian(x, noise_scale, random_seed) -> bigint

Counts the `TRUE` values and then adds a random Gaussian noise
with 0 mean and standard deviation of ``noise_scale`` to the true count.
The noisy count is post-processed to be non-negative and rounded to bigint.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_count_if_gaussian(orderkey > 10, 20.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_count_if_gaussian(orderkey > 10, 20.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_sum_gaussian(x, noise_scale) -> double

Calculates the sum over the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the sum, and the return type is double.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_sum_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_sum_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_sum_gaussian(x, noise_scale, random_seed) -> double

Calculates the sum over the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the sum, and the return type is double.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_sum_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_sum_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_sum_gaussian(x, noise_scale, lower, upper) -> double

Calculates the sum over the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the sum, and the return type is double.

Each value is clipped to the range of ``[lower, upper]`` before adding to the sum.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_sum_gaussian(orderkey, 20.0, 10.0, 50.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_sum_gaussian(orderkey, 20.0, 10.0, 51.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_sum_gaussian(x, noise_scale, lower, upper, random_seed) -> double

Calculates the sum over the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the sum, and the return type is double.

Each value is clipped to the range of ``[lower, upper]`` before adding to the sum.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_sum_gaussian(orderkey, 20.0, 10.0, 50.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_sum_gaussian(orderkey, 20.0, 10.0, 50.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_avg_gaussian(x, noise_scale) -> double

Calculates the average (arithmetic mean) of all the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the avg, and the return type is double.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_avg_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_avg_gaussian(orderkey, 20.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_avg_gaussian(x, noise_scale, random_seed) -> double

Calculates the average (arithmetic mean) of all the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the avg, and the return type is double.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_avg_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_avg_gaussian(orderkey, 20.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_avg_gaussian(x, noise_scale, lower, upper) -> double

Calculates the average (arithmetic mean) of all the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the avg, and the return type is double.

Each value is clipped to the range of ``[lower, upper]`` before adding to the avg.

When there are no input rows, this function returns ``NULL``.

Noise is from a secure random. ::

SELECT noisy_avg_gaussian(orderkey, 20.0, 10.0, 50.0) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_avg_gaussian(orderkey, 20.0, 10.0, 51.0) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)

.. function:: noisy_avg_gaussian(x, noise_scale, lower, upper, random_seed) -> double

Calculates the average (arithmetic mean) of all the input values and then adds random Gaussian noise
with 0 mean and standard deviation of ``noise_scale``.
All values are converted to double before being added to the avg, and the return type is double.

Each value is clipped to the range of ``[lower, upper]`` before adding to the avg.

When there are no input rows, this function returns ``NULL``.

Random seed is used to seed the random generator.
This method does not use a secure random. ::

SELECT noisy_avg_gaussian(orderkey, 20.0, 10.0, 50.0, 321) FROM tpch.tiny.lineitem WHERE false; -- NULL (1 row)
SELECT noisy_avg_gaussian(orderkey, 20.0, 10.0, 50.0, 321) FROM tpch.tiny.lineitem WHERE false GROUP BY orderkey; -- (0 row)


Statistical Aggregate Functions
-------------------------------
Expand Down Expand Up @@ -1108,7 +946,12 @@ equal probability of being chosen. See [Vitter1985]_.
-----------------+----------------------------------
5 | [{idx=1, val=a}, {idx=5, val=e}]
Noisy Aggregate Functions
-------------------------

See :doc:`noisy`.


---------------------------

Expand Down
Loading

0 comments on commit d8376e1

Please sign in to comment.