-
Notifications
You must be signed in to change notification settings - Fork 1
Refactor algorithms to improve numerical stability #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (7)
- benchmark/benchmark_comparison_non_stream_WWR.jl: Language not supported
- benchmark/benchmark_comparison_stream_WWR.jl: Language not supported
- src/SamplingInterface.jl: Language not supported
- src/SamplingUtils.jl: Language not supported
- src/SortedSamplingSingle.jl: Language not supported
- src/UnweightedSamplingMulti.jl: Language not supported
- src/WeightedSamplingMulti.jl: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The purpose of this PR is to refactor the sampling algorithms to improve numerical stability and performance. Key changes include updating the choose functions to a simplified interface, replacing traditional shuffle! calls with a new faster_shuffle! implementation, and updating random sampling calls to use Random.Sampler with @inline hints.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/WeightedSamplingMulti.jl | Refactored choose function usage and updated sampling logic. |
src/UnweightedSamplingMulti.jl | Simplified choose implementation and updated random selections. |
src/SortedSamplingSingle.jl | Updated single sample selection to use the new Random.Sampler. |
src/SamplingUtils.jl | Introduced faster_shuffle! with modified type handling. |
src/SamplingInterface.jl | Replaced shuffle! with faster_shuffle! across sampling flows. |
benchmark/benchmark_comparison_stream_WWR.jl | Updated sampling value computation to use faster_shuffle!. |
benchmark/benchmark_comparison_non_stream_WWR.jl | Overhauled reservoir sampling functions including skip and choose. |
Comments suppressed due to low confidence (1)
benchmark/benchmark_comparison_non_stream_WWR.jl:48
- [nitpick] Verify that the revised skip function's arithmetic using ((1-z)/z) aligns with the intended reservoir sampling behavior and produces equivalent skip thresholds as the previous implementation.
return w_sum*((1-z)/z)
function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector) | ||
for i in 2:length(vec) | ||
endi = (i-1) % UInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Review the computation of 'endi' using the modulus operator on (i-1) cast to UInt; consider an explicit conversion (e.g., UInt(i-1)) to ensure the intended upper bound for sampling is correctly computed.
endi = (i-1) % UInt | |
endi = UInt(i-1) % UInt |
Copilot uses AI. Check for mistakes.
@inline function choose(rng, n, p) | ||
z = exp(n*log1p(-p)) | ||
t = rand(rng, Uniform(z, 1.0)) | ||
s = n*p | ||
q = 1-p | ||
k = z*q*q*q*(q + n*p) | ||
k > c && return 1 | ||
k += n*p*(n-1)*p*z*q*q/2 | ||
k > c && return 2 | ||
k += n*p*(n-1)*p*(n-2)*p*z*q/6 | ||
k > c && return 3 | ||
k += n*p*(n-1)*p*(n-2)*p*(n-3)*p*z/24 | ||
k > c && return 4 | ||
b = Binomial(n, p) | ||
return quantile(b, c) | ||
x = z + z*s/q | ||
x > t && return 1 | ||
s *= (n-1)*p | ||
q *= 1-p | ||
x += (s*z/q)/2 | ||
x > t && return 2 | ||
s *= (n-2)*p | ||
q *= 1-p | ||
x += (s*z/q)/6 | ||
x > t && return 3 | ||
s *= (n-3)*p | ||
q *= 1-p | ||
x += (s*z/q)/24 | ||
x > t && return 4 | ||
s *= (n-4)*p | ||
q *= 1-p | ||
x += (s*z/q)/120 | ||
x > t && return 5 | ||
return quantile(Binomial(n, p), t) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding inline comments within the choose function to explain the successive threshold comparisons and the logic behind the decision to return a specific value, which would improve maintainability by clarifying the numerical stability rationale.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors various sampling algorithms to improve numerical stability while standardizing the random sampling API across multiple modules and benchmarks. Key changes include:
- Updating calls to the sampling function to use Random.Sampler and inline annotations.
- Refactoring the choose and skip functions to compute probability thresholds within the algorithms.
- Replacing traditional shuffle functions with a new faster_shuffle! implementation.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/WeightedSamplingMulti.jl | Refactored probability computations in the choose function and updated sampling branches. |
src/UnweightedSamplingMulti.jl | Standardized random sampling calls and refactored the choose function logic. |
src/SortedSamplingSingle.jl | Updated random sampling call to use Random.Sampler for consistency. |
src/SamplingUtils.jl | Introduced a new faster_shuffle! function for improved performance during shuffling. |
src/SamplingInterface.jl | Replaced calls to shuffle! with faster_shuffle! across multiple sampling functions. |
benchmark/benchmark_comparison_stream_WWR.jl | Updated sampling logic in benchmark functions to utilize faster_shuffle!. |
benchmark/benchmark_comparison_non_stream_WWR.jl | Refactored weighted reservoir sampling logic including changes in sample, skip, and choose. |
Comments suppressed due to low confidence (2)
src/UnweightedSamplingMulti.jl:175
- Double-check that replacing exp((n-4)log1p(-p)) with exp(nlog1p(-p)) in the choose function is mathematically intended, as this change could alter the probability thresholds.
z = exp(n*log1p(-p))
benchmark/benchmark_comparison_non_stream_WWR.jl:46
- The new skip function formula 'w_sum*((1-z)/z)' differs from the original implementation; please verify that this change preserves the intended skipping behavior in the weighted sampling algorithm.
function skip(rng, w_sum, n)
function faster_shuffle!(rng::AbstractRNG, vec::AbstractVector) | ||
for i in 2:length(vec) | ||
endi = (i-1) % UInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression '(i-1) % UInt' appears to misuse the modulo operator with a type; if the intent is to convert (i-1) to a UInt, consider using 'UInt(i-1)' instead.
endi = (i-1) % UInt | |
endi = UInt(i-1) |
Copilot uses AI. Check for mistakes.
reservoir[rand(rng, 1:n)] = a[i] | ||
else | ||
for j in 1:k | ||
reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] For consistency with other updated sampling code, consider using '@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))' instead of 'rand(rng, j:n)'.
reservoir[rand(rng, j:n)], reservoir[j] = reservoir[j], a[i] | |
reservoir[@inline rand(rng, Random.Sampler(rng, j:n, Val(1)))], reservoir[j] = reservoir[j], a[i] |
Copilot uses AI. Check for mistakes.
No description provided.