Skip to content

Refactor algorithms to improve numerical stability #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

Tortar
Copy link
Member

@Tortar Tortar commented Mar 6, 2025

No description provided.

@Tortar Tortar requested a review from Copilot May 5, 2025 14:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (7)
  • benchmark/benchmark_comparison_non_stream_WWR.jl: Language not supported
  • benchmark/benchmark_comparison_stream_WWR.jl: Language not supported
  • src/SamplingInterface.jl: Language not supported
  • src/SamplingUtils.jl: Language not supported
  • src/SortedSamplingSingle.jl: Language not supported
  • src/UnweightedSamplingMulti.jl: Language not supported
  • src/WeightedSamplingMulti.jl: Language not supported

@Tortar Tortar requested a review from Copilot May 11, 2025 20:33
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The purpose of this PR is to refactor the sampling algorithms to improve numerical stability and performance. Key changes include updating the choose functions to a simplified interface, replacing traditional shuffle! calls with a new faster_shuffle! implementation, and updating random sampling calls to use Random.Sampler with @inline hints.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/WeightedSamplingMulti.jl Refactored choose function usage and updated sampling logic.
src/UnweightedSamplingMulti.jl Simplified choose implementation and updated random selections.
src/SortedSamplingSingle.jl Updated single sample selection to use the new Random.Sampler.
src/SamplingUtils.jl Introduced faster_shuffle! with modified type handling.
src/SamplingInterface.jl Replaced shuffle! with faster_shuffle! across sampling flows.
benchmark/benchmark_comparison_stream_WWR.jl Updated sampling value computation to use faster_shuffle!.
benchmark/benchmark_comparison_non_stream_WWR.jl Overhauled reservoir sampling functions including skip and choose.
Comments suppressed due to low confidence (1)

benchmark/benchmark_comparison_non_stream_WWR.jl:48

  • [nitpick] Verify that the revised skip function's arithmetic using ((1-z)/z) aligns with the intended reservoir sampling behavior and produces equivalent skip thresholds as the previous implementation.
return w_sum*((1-z)/z)

function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector)
for i in 2:length(vec)
endi = (i-1) % UInt
Copy link
Preview

Copilot AI May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Review the computation of 'endi' using the modulus operator on (i-1) cast to UInt; consider an explicit conversion (e.g., UInt(i-1)) to ensure the intended upper bound for sampling is correctly computed.

Suggested change
endi = (i-1) % UInt
endi = UInt(i-1) % UInt

Copilot uses AI. Check for mistakes.

Comment on lines +174 to 198
@inline function choose(rng, n, p)
z = exp(n*log1p(-p))
t = rand(rng, Uniform(z, 1.0))
s = n*p
q = 1-p
k = z*q*q*q*(q + n*p)
k > c && return 1
k += n*p*(n-1)*p*z*q*q/2
k > c && return 2
k += n*p*(n-1)*p*(n-2)*p*z*q/6
k > c && return 3
k += n*p*(n-1)*p*(n-2)*p*(n-3)*p*z/24
k > c && return 4
b = Binomial(n, p)
return quantile(b, c)
x = z + z*s/q
x > t && return 1
s *= (n-1)*p
q *= 1-p
x += (s*z/q)/2
x > t && return 2
s *= (n-2)*p
q *= 1-p
x += (s*z/q)/6
x > t && return 3
s *= (n-3)*p
q *= 1-p
x += (s*z/q)/24
x > t && return 4
s *= (n-4)*p
q *= 1-p
x += (s*z/q)/120
x > t && return 5
return quantile(Binomial(n, p), t)
end
Copy link
Preview

Copilot AI May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding inline comments within the choose function to explain the successive threshold comparisons and the logic behind the decision to return a specific value, which would improve maintainability by clarifying the numerical stability rationale.

Copilot uses AI. Check for mistakes.

@Tortar Tortar requested a review from Copilot May 19, 2025 16:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors various sampling algorithms to improve numerical stability while standardizing the random sampling API across multiple modules and benchmarks. Key changes include:

  • Updating calls to the sampling function to use Random.Sampler and inline annotations.
  • Refactoring the choose and skip functions to compute probability thresholds within the algorithms.
  • Replacing traditional shuffle functions with a new faster_shuffle! implementation.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/WeightedSamplingMulti.jl Refactored probability computations in the choose function and updated sampling branches.
src/UnweightedSamplingMulti.jl Standardized random sampling calls and refactored the choose function logic.
src/SortedSamplingSingle.jl Updated random sampling call to use Random.Sampler for consistency.
src/SamplingUtils.jl Introduced a new faster_shuffle! function for improved performance during shuffling.
src/SamplingInterface.jl Replaced calls to shuffle! with faster_shuffle! across multiple sampling functions.
benchmark/benchmark_comparison_stream_WWR.jl Updated sampling logic in benchmark functions to utilize faster_shuffle!.
benchmark/benchmark_comparison_non_stream_WWR.jl Refactored weighted reservoir sampling logic including changes in sample, skip, and choose.
Comments suppressed due to low confidence (2)

src/UnweightedSamplingMulti.jl:175

  • Double-check that replacing exp((n-4)log1p(-p)) with exp(nlog1p(-p)) in the choose function is mathematically intended, as this change could alter the probability thresholds.
z = exp(n*log1p(-p))

benchmark/benchmark_comparison_non_stream_WWR.jl:46

  • The new skip function formula 'w_sum*((1-z)/z)' differs from the original implementation; please verify that this change preserves the intended skipping behavior in the weighted sampling algorithm.
function skip(rng, w_sum, n)

function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector)
for i in 2:length(vec)
endi = (i-1) % UInt
Copy link
Preview

Copilot AI May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression '(i-1) % UInt' appears to misuse the modulo operator with a type; if the intent is to convert (i-1) to a UInt, consider using 'UInt(i-1)' instead.

Suggested change
endi = (i-1) % UInt
endi = UInt(i-1)

Copilot uses AI. Check for mistakes.

reservoir[rand(rng, 1:n)] = a[i]
else
for j in 1:k
reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i]
Copy link
Preview

Copilot AI May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] For consistency with other updated sampling code, consider using '@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))' instead of 'rand(rng, j:n)'.

Suggested change
reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i]
reservoir[@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))], reservoir[j] = reservoir[j], a[i]

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant