Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Faster decimal precision overflow checks #6419

Merged
merged 11 commits into from
Sep 21, 2024

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Sep 18, 2024

Which issue does this PR close?

N/A

Rationale for this change

Small performance optimization.

null_if_overflow_precision_128
                        time:   [14.371 µs 14.403 µs 14.436 µs]
                        change: [-80.547% -80.475% -80.399%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

null_if_overflow_precision_256
                        time:   [26.421 µs 26.531 µs 26.642 µs]
                        change: [-87.653% -87.587% -87.522%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

validate_decimal_precision_128
                        time:   [74.056 ns 74.401 ns 74.752 ns]
                        change: [-11.615% -10.563% -9.4441%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

validate_decimal_precision_256
                        time:   [215.61 ns 216.62 ns 217.68 ns]
                        change: [-0.8629% -0.3349% +0.1701%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

What changes are included in this PR?

I noticed two areas of overhead in the current approach to verifying decimal precision.

  1. In the event of an overflow, we are building a formatted string for an error message, which is then discarded in some cases, so we should avoid that cost
  2. The range check code introduces a memcpy which can be avoided

I tested the following variations of the decimal precision check in Rust playground.

fn validate_decimal_precision1(value: i128, precision: u8) -> bool {
    if precision > DECIMAL128_MAX_PRECISION {
        return false;
    }
    let idx = usize::from(precision) - 1;
    value >= MIN_DECIMAL_FOR_EACH_PRECISION[idx] && value <= MAX_DECIMAL_FOR_EACH_PRECISION[idx]
}

// based on arrow-rs version
fn validate_decimal_precision2(value: i128, precision: u8) -> bool {
    if precision > DECIMAL128_MAX_PRECISION {
        return false;
    }

    let max = MAX_DECIMAL_FOR_EACH_PRECISION[usize::from(precision) - 1];
    let min = MIN_DECIMAL_FOR_EACH_PRECISION[usize::from(precision) - 1];

    if value > max {
        false
    } else if value < min {
        false
    } else {
        true
    }
}

validate_decimal_precision1 avoids a memcpy that appears in validate_decimal_precision2:

playground::validate_decimal_precision1:
	subq	$1304, %rsp
	movb	%dl, %al
	movb	%al, 23(%rsp)
	movq	%rsi, 24(%rsp)
	movq	%rdi, 32(%rsp)
	movq	%rdi, 1264(%rsp)
	movq	%rsi, 1272(%rsp)
	movb	%al, 1287(%rsp)
	cmpb	$38, %al
	ja	.LBB9_2
	movb	23(%rsp), %al
	movb	%al, 1303(%rsp)
	movzbl	%al, %eax
	movq	%rax, %rcx
	subq	$1, %rcx
	movq	%rcx, 8(%rsp)
	cmpq	$1, %rax
	jb	.LBB9_4
	jmp	.LBB9_3

playground::validate_decimal_precision2:
	subq	$1368, %rsp
	movb	%dl, %al
	movb	%al, 55(%rsp)
	movq	%rsi, 56(%rsp)
	movq	%rdi, 64(%rsp)
	movq	%rdi, 1296(%rsp)
	movq	%rsi, 1304(%rsp)
	movb	%al, 1327(%rsp)
	cmpb	$38, %al
	ja	.LBB10_2
	leaq	80(%rsp), %rdi
	leaq	.L__unnamed_5(%rip), %rsi
	movl	$608, %edx
	callq	memcpy@PLT   <----- MEMCPY HERE
	movb	55(%rsp), %al
	movb	%al, 1367(%rsp)
	movzbl	%al, %eax
	movq	%rax, %rcx
	subq	$1, %rcx
	movq	%rcx, 40(%rsp)
	cmpq	$1, %rax
	jb	.LBB10_4
	jmp	.LBB10_3

Are there any user-facing changes?

Technically, this is an API change because I made two consts pub(crate) instead of pub. However, if anyone wants access to the original consts they could just copy them into their code base.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 18, 2024
Comment on lines -742 to -743
let max = MAX_DECIMAL_FOR_EACH_PRECISION[usize::from(precision) - 1];
let min = MIN_DECIMAL_FOR_EACH_PRECISION[usize::from(precision) - 1];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is where the memcpy was being introduced

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is interesting. By assigning a new variable, it did memory copying of the i128 value.

@andygrove andygrove marked this pull request as draft September 18, 2024 20:47
@andygrove andygrove marked this pull request as ready for review September 19, 2024 14:19
Comment on lines +1264 to +1266
fn is_valid_decimal_precision(value: Self::Native, precision: u8) -> bool {
is_validate_decimal_precision(value, precision)
}
Copy link
Member

@viirya viirya Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used to avoid creating ArrowError and the error string for the cases that we don't need the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

arrow-data/src/decimal.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @andygrove
probably a nit: I can see usize::from(precision) - 1 a lot which require new allocation of usize.
My feeling the precision is static? may we can precompute idx and pass it by reference?

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@andygrove
Copy link
Member Author

thanks @andygrove probably a nit: I can see usize::from(precision) - 1 a lot which require new allocation of usize. My feeling the precision is static? may we can precompute idx and pass it by reference?

What if we just insert an extra element in the start of each of the lookup arrays and then just use precision as usize directly as the index?

@andygrove
Copy link
Member Author

andygrove commented Sep 19, 2024

Updated benchmark results after removing the idx allocation:

null_if_overflow_precision_128
                        time:   [12.555 µs 12.591 µs 12.628 µs]
                        change: [-85.358% -85.314% -85.275%] (p = 0.00 < 0.05)
                        Performance has improved.

null_if_overflow_precision_256
                        time:   [25.614 µs 25.755 µs 25.903 µs]
                        change: [-87.983% -87.827% -87.620%] (p = 0.00 < 0.05)
                        Performance has improved.

validate_decimal_precision_128
                        time:   [74.290 ns 74.703 ns 75.109 ns]
                        change: [-7.7409% -7.2664% -6.8248%] (p = 0.00 < 0.05)
                        Performance has improved.

validate_decimal_precision_256
                        time:   [215.27 ns 216.01 ns 216.75 ns]
                        change: [-4.4231% -3.7018% -2.9993%] (p = 0.00 < 0.05)
                        Performance has improved.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @andygrove

@andygrove andygrove added the api-change Changes to the arrow API label Sep 19, 2024
arrow-data/src/decimal.rs Outdated Show resolved Hide resolved
@tustvold tustvold added the next-major-release the PR has API changes and it waiting on the next major version label Sep 19, 2024
@andygrove andygrove removed the api-change Changes to the arrow API label Sep 20, 2024
@andygrove andygrove removed the next-major-release the PR has API changes and it waiting on the next major version label Sep 20, 2024
@andygrove
Copy link
Member Author

I reverted the API change

@andygrove
Copy link
Member Author

@viirya @comphead There have been changes since this PR was approved. Please take another look when you can.

@comphead
Copy link
Contributor

I'm okay with the PR, although the array that changed moved to crate visibility and doesn't seem to affect others, however I'm not sure if anyone else other than arrow-rs actually used it

@andygrove
Copy link
Member Author

I'm okay with the PR, although the array that changed moved to crate visibility and doesn't seem to affect others, however I'm not sure if anyone else other than arrow-rs actually used it

I reverted the change to the original array and added a copy with the extra element so as not to break the current API. The new array is pub(crate)

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@andygrove andygrove merged commit c90713b into apache:master Sep 21, 2024
27 checks passed
@andygrove andygrove deleted the faster-decimal-overflow-check branch September 21, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants