-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding interpolating accessors #456
Conversation
@@ -219,7 +219,7 @@ impl TimeWeightMethod { | |||
Ok(pt) | |||
} | |||
|
|||
fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 { | |||
pub fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 { | |||
debug_assert!(second.ts > first.ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reserve this for debug builds? Now that it's public, can we reconsider?
Also, would an error message help? Should users see this from bad input, or only if we have a bug?
}; | ||
time_weighted_average::TimeWeightMethod::Linear | ||
.interpolate(first, Some(self.first), interval_start) | ||
.expect("unable to interpolate lower bound") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate
returns for two different reasons; would it be helpful for us to show the user that distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this upper vs. lower bound? We do have a separate panic message for the cases.
let states_len = states.len() as u64; | ||
let durations_len = durations.len() as u64; | ||
let mut first_state = durations.len(); | ||
let mut last_state = durations.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These read a little confusing to me. When durations.is_empty()
we're setting them both to 0 via len()
but it might be clearer if that were explicit.
Actually, I'm pretty sure I see two different factories here: one requiring a non-empty list and first and last Records, and one Default
impl with all fields 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was more about initializing them to values that would fail the assertion at line 77 if we didn't overwrite them inside the loop. However, you are right that this becomes easier to follow if we handle the empty case separately at the top of this function. The existing behavior allows for calling new with empty arguments, so I don't want to move that into a Default
as part of this change.
extension/src/state_aggregate.rs
Outdated
@@ -71,6 +110,85 @@ pub mod toolkit_experimental { | |||
let end = record.state_end as usize; | |||
&self.states_as_str()[beg..end] | |||
} | |||
|
|||
#[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I find this one very hard to follow. I think clippy is only picking up on a symptom of the larger problem. I'm still trying to wrap my head around it, so I certainly don't have a suggest yet :)
One clue I have: do prev and next represent optional parameters coming in from SQL? Meaning that the only legal inputs are these?
- (None, None)
- (Some(prev), None)
- (Some(prev), Some(next))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly correct, (None, Some(next)) is also valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I understand this method now, and have two suggestions:
-
My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private
interpolate_first
andinterpolate_last
methods to be called here. -
Addressing clippy's complaint is a simple change; instead of:
let first = if interval_start < self.first_time && prev.is_some() {
let prev = prev.unwrap();
do:
match prev {
Some(prev) if interval_start < self.first_time => {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the clippy complaint as you mentioned, great suggestion.
For the interpolate_first/last
I found that the code I was trying to refactor out wasn't well encapsulated due to the need to potentially add to the states string that was being built up for the result. I left it as it is for now, since creating a interpolate_start_and_update_state_string
doesn't seem much of an improvement, but it does seem like this could use a refactor.
extension/src/state_aggregate.rs
Outdated
states_len: 0, | ||
states: states.into_bytes().into(), | ||
durations_len: 0, | ||
durations: (&*durations).into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look pretty weird, given that states and durations are both empty here. Why not Slice::Slice(&[])
?
At that point, you're looking at a big pile of default values, which is why I suggested impl Default
.
This doesn't need to hold up merge. But if you want, I can push a branch demonstrating what it would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that does look much nicer with the Slice::Slice(&[])
For the default, are you suggesting that we move this instantiation code here into a default impl and just return default here. Or you suggesting that we implement a default and don't allow new to be called with empty containers? I assumed it's the second one, which does make the calling code a bit more complicated, but given that we only use this in one place that's probably not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
first_time: first.map_or(0, |s| s.time), | ||
last_time: last.map_or(0, |s| s.time), | ||
first_state: first_state as u32, | ||
last_state: last_state as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that, but this is giving me the willies over far too many overflow bugs that look like this. This looks a whole lot like all the qmail vulnerabilities that came from its 32-bit assumptions no longer holding after the rise of amd64 (and therefore mass market 64-bit).
Can we u32::try_from().expect("len must fit into u32")
in at least some boundary places? It may be out of scope for this branch as I don't think this is a new pattern for us. Should I file an issue about it?
extension/src/state_aggregate.rs
Outdated
@@ -71,6 +110,85 @@ pub mod toolkit_experimental { | |||
let end = record.state_end as usize; | |||
&self.states_as_str()[beg..end] | |||
} | |||
|
|||
#[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I understand this method now, and have two suggestions:
-
My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private
interpolate_first
andinterpolate_last
methods to be called here. -
Addressing clippy's complaint is a simple change; instead of:
let first = if interval_start < self.first_time && prev.is_some() {
let prev = prev.unwrap();
do:
match prev {
Some(prev) if interval_start < self.first_time => {
extension/src/state_aggregate.rs
Outdated
next: Option<StateAgg>, | ||
) -> crate::raw::Interval { | ||
match aggregate { | ||
None => todo!("either 0 or interval if matches prev last state"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to address this as part of this work, or at a later date? It might help to keep an issue open for it and reference that issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an interesting question. We can figure out how to handle a null aggregate in this case, just implement the todo
above. However, this null aggregate is going to be the prev for the next group, and was the next for the previous group for that matter. In both of those cases, we don't actually have the information we need to correctly handle an empty group appearing in bucketing, and will return an incorrect answer.
So, I think I want to deal with this by generating an error in this case here, as it's the only place we can identify that we're likely going to generate the wrong answer for some of our buckets.
03ae67b
to
aad4ecf
Compare
bors r+ |
Build succeeded: |
This change adds the following interpolating accessors:
interpolated_duration_in
tostate_agg
interpolated_average
totime_weight
interpolated_delta
andinterpolated_rate
tocounter_agg
andgauge_agg
These accessors all take an interval lower bound and interval duration along with the previous and next matching aggregate and compute the result using the computed boundary point. These accessors can be used in a postgres window function to give correct results for data that's been grouped into separate time intervals, such as with
time_bucket
.Fixes #440