Refactor algorithms to improve numerical stability #113

Tortar · 2025-03-06T14:23:04Z

No description provided.

Copilot

Copilot wasn't able to review any files in this pull request.

Files not reviewed (7)

benchmark/benchmark_comparison_non_stream_WWR.jl: Language not supported
benchmark/benchmark_comparison_stream_WWR.jl: Language not supported
src/SamplingInterface.jl: Language not supported
src/SamplingUtils.jl: Language not supported
src/SortedSamplingSingle.jl: Language not supported
src/UnweightedSamplingMulti.jl: Language not supported
src/WeightedSamplingMulti.jl: Language not supported

Copilot

Pull Request Overview

The purpose of this PR is to refactor the sampling algorithms to improve numerical stability and performance. Key changes include updating the choose functions to a simplified interface, replacing traditional shuffle! calls with a new faster_shuffle! implementation, and updating random sampling calls to use Random.Sampler with @inline hints.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/WeightedSamplingMulti.jl	Refactored choose function usage and updated sampling logic.
src/UnweightedSamplingMulti.jl	Simplified choose implementation and updated random selections.
src/SortedSamplingSingle.jl	Updated single sample selection to use the new Random.Sampler.
src/SamplingUtils.jl	Introduced faster_shuffle! with modified type handling.
src/SamplingInterface.jl	Replaced shuffle! with faster_shuffle! across sampling flows.
benchmark/benchmark_comparison_stream_WWR.jl	Updated sampling value computation to use faster_shuffle!.
benchmark/benchmark_comparison_non_stream_WWR.jl	Overhauled reservoir sampling functions including skip and choose.

Comments suppressed due to low confidence (1)

benchmark/benchmark_comparison_non_stream_WWR.jl:48

[nitpick] Verify that the revised skip function's arithmetic using ((1-z)/z) aligns with the intended reservoir sampling behavior and produces equivalent skip thresholds as the previous implementation.

return w_sum*((1-z)/z)

Copilot · 2025-05-11T20:33:58Z

src/SamplingUtils.jl

+function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector)
+    for i in 2:length(vec)
+        endi = (i-1) % UInt


[nitpick] Review the computation of 'endi' using the modulus operator on (i-1) cast to UInt; consider an explicit conversion (e.g., UInt(i-1)) to ensure the intended upper bound for sampling is correctly computed.

Suggested change

endi = (i-1) % UInt

endi = UInt(i-1) % UInt

Copilot · 2025-05-11T20:33:59Z

src/UnweightedSamplingMulti.jl

+@inline function choose(rng, n, p)
+    z = exp(n*log1p(-p))
+    t = rand(rng, Uniform(z, 1.0))
+    s = n*p
    q = 1-p
-    k = z*q*q*q*(q + n*p)
-    k > c && return 1
-    k += n*p*(n-1)*p*z*q*q/2
-    k > c && return 2
-    k += n*p*(n-1)*p*(n-2)*p*z*q/6
-    k > c && return 3
-    k += n*p*(n-1)*p*(n-2)*p*(n-3)*p*z/24
-    k > c && return 4
-    b = Binomial(n, p)
-    return quantile(b, c)
+    x = z + z*s/q
+    x > t && return 1
+    s *= (n-1)*p
+    q *= 1-p
+    x += (s*z/q)/2
+    x > t && return 2
+    s *= (n-2)*p
+    q *= 1-p
+    x += (s*z/q)/6
+    x > t && return 3
+    s *= (n-3)*p
+    q *= 1-p
+    x += (s*z/q)/24
+    x > t && return 4
+    s *= (n-4)*p
+    q *= 1-p
+    x += (s*z/q)/120
+    x > t && return 5
+    return quantile(Binomial(n, p), t)
 end


[nitpick] Consider adding inline comments within the choose function to explain the successive threshold comparisons and the logic behind the decision to return a specific value, which would improve maintainability by clarifying the numerical stability rationale.

Copilot

Pull Request Overview

This pull request refactors various sampling algorithms to improve numerical stability while standardizing the random sampling API across multiple modules and benchmarks. Key changes include:

Updating calls to the sampling function to use Random.Sampler and inline annotations.
Refactoring the choose and skip functions to compute probability thresholds within the algorithms.
Replacing traditional shuffle functions with a new faster_shuffle! implementation.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/WeightedSamplingMulti.jl	Refactored probability computations in the choose function and updated sampling branches.
src/UnweightedSamplingMulti.jl	Standardized random sampling calls and refactored the choose function logic.
src/SortedSamplingSingle.jl	Updated random sampling call to use Random.Sampler for consistency.
src/SamplingUtils.jl	Introduced a new faster_shuffle! function for improved performance during shuffling.
src/SamplingInterface.jl	Replaced calls to shuffle! with faster_shuffle! across multiple sampling functions.
benchmark/benchmark_comparison_stream_WWR.jl	Updated sampling logic in benchmark functions to utilize faster_shuffle!.
benchmark/benchmark_comparison_non_stream_WWR.jl	Refactored weighted reservoir sampling logic including changes in sample, skip, and choose.

Comments suppressed due to low confidence (2)

src/UnweightedSamplingMulti.jl:175

Double-check that replacing exp((n-4)log1p(-p)) with exp(nlog1p(-p)) in the choose function is mathematically intended, as this change could alter the probability thresholds.

z = exp(n*log1p(-p))

benchmark/benchmark_comparison_non_stream_WWR.jl:46

The new skip function formula 'w_sum*((1-z)/z)' differs from the original implementation; please verify that this change preserves the intended skipping behavior in the weighted sampling algorithm.

function skip(rng, w_sum, n)

Copilot · 2025-05-19T16:41:06Z

src/SamplingUtils.jl

+function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector)
+    for i in 2:length(vec)
+        endi = (i-1) % UInt


The expression '(i-1) % UInt' appears to misuse the modulo operator with a type; if the intent is to convert (i-1) to a UInt, consider using 'UInt(i-1)' instead.

Suggested change

endi = (i-1) % UInt

endi = UInt(i-1)

Copilot · 2025-05-19T16:41:06Z

benchmark/benchmark_comparison_non_stream_WWR.jl

+                reservoir[rand(rng, 1:n)] = a[i]
+            else
+                for j in 1:k
+        	    reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i]


[nitpick] For consistency with other updated sampling code, consider using '@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))' instead of 'rand(rng, j:n)'.

Suggested change

reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i]

reservoir[@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))], reservoir[j] = reservoir[j], a[i]

Tortar added 16 commits March 6, 2025 15:22

Refactor algorithms to improve numerical stability

ba13d03

Update UnweightedSamplingMulti.jl

dc99265

Update WeightedSamplingMulti.jl

df19a27

Update WeightedSamplingMulti.jl

0da7505

Update UnweightedSamplingMulti.jl

e0302e7

Update WeightedSamplingMulti.jl

75e8eb4

Update UnweightedSamplingMulti.jl

1868587

Update WeightedSamplingMulti.jl

5fd7515

Update SamplingUtils.jl

6f01c81

Update SamplingInterface.jl

6edac86

Update benchmark_comparison_stream_WWR.jl

385920b

Update WeightedSamplingMulti.jl

5c67578

Update UnweightedSamplingMulti.jl

26fcd3d

Update UnweightedSamplingMulti.jl

1b84f76

Update SortedSamplingSingle.jl

dacd5eb

Update SortedSamplingSingle.jl

66a899d

Tortar requested a review from Copilot May 5, 2025 14:45

Copilot AI reviewed May 5, 2025

View reviewed changes

Tortar requested a review from Copilot May 11, 2025 20:33

Copilot AI reviewed May 11, 2025

View reviewed changes

Tortar requested a review from Copilot May 19, 2025 16:40

Copilot AI reviewed May 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor algorithms to improve numerical stability #113

Refactor algorithms to improve numerical stability #113

Uh oh!

Tortar commented Mar 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 11, 2025

Uh oh!

Copilot AI May 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 19, 2025

Uh oh!

Copilot AI May 19, 2025

Uh oh!

Uh oh!

	reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i]
	reservoir[@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))], reservoir[j] = reservoir[j], a[i]

Refactor algorithms to improve numerical stability #113

Are you sure you want to change the base?

Refactor algorithms to improve numerical stability #113

Uh oh!

Conversation

Tortar commented Mar 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!