Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added OCT, REPEAT scalar functions. #3092

Merged
merged 3 commits into from
Nov 26, 2021
Merged

Conversation

sunli829
Copy link
Contributor

@sunli829 sunli829 commented Nov 25, 2021

I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

Summary

  • Added OCT, REPEAT scalar functions.

  • Added helper methods:

    DataType::is_integer DataType::is_signed_integer DataType::is_unsigned_integer

    DataValue::is_integer DataValue::is_signed_integer DataValue::is_unsigned_integer

Changelog

  • New Feature

Related Issues

fixes #3074, fixes #3068, fixes #3055

Test Plan

Unit Tests

Stateless Tests

@databend-bot databend-bot added the pr-feature this PR introduces a new feature to the codebase label Nov 25, 2021
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

1 similar comment
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@sunli829 sunli829 marked this pull request as draft November 25, 2021 07:21
@sunli829 sunli829 marked this pull request as ready for review November 25, 2021 07:22
@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2021

Codecov Report

Merging #3092 (f5fda2e) into main (507c834) will decrease coverage by 0%.
The diff coverage is 7%.

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #3092    +/-   ##
======================================
- Coverage     68%     68%    -1%     
======================================
  Files        652     658     +6     
  Lines      34329   34519   +190     
======================================
+ Hits       23488   23549    +61     
- Misses     10841   10970   +129     
Impacted Files Coverage Δ
common/datavalues/src/data_value.rs 44% <0%> (-2%) ⬇️
common/datavalues/src/types/data_type.rs 62% <0%> (-6%) ⬇️
common/functions/src/scalars/strings/repeat.rs 5% <5%> (ø)
common/functions/src/scalars/strings/oct.rs 6% <6%> (ø)
common/functions/src/scalars/strings/string.rs 83% <100%> (+8%) ⬆️
common/streams/src/stream_source.rs 0% <0%> (-100%) ⬇️
query/src/interpreters/interpreter_insert_into.rs 78% <0%> (-8%) ⬇️
...ry/src/datasources/table/fuse/io/block_appender.rs 92% <0%> (-3%) ⬇️
query/src/interpreters/plan_scheduler.rs 32% <0%> (-2%) ⬇️
query/src/datasources/table/fuse/table.rs 88% <0%> (-2%) ⬇️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 507c834...f5fda2e. Read the comment docs.

}

fn return_type(&self, args: &[DataType]) -> Result<DataType> {
if !args[0].is_integer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args[0] can be DataType::null.

if args[0] is String, we can cast it into Int64 using cast_with_type

MySQL support explicit cast:

select oct('56');
select oct('56a');
select oct(null);

)));
}

if !args[1].is_integer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just accept args[1] as unsigned integer | String | Null.

Then we can cast column[1] to UInt64Column to reduce the matches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast is not expensive because it's vectorized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can cast column[1] to UInt64Column to reduce the matches.

Will this cause unnecessary memory allocation?

Copy link
Member

@sundy-li sundy-li Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it's ok. The old column memory will be dropped immediately if no reference.

In most case, the second column is a DataColumn::Constant, so we can use let times = columns[1].to_minimal_array().cast_with_type(&DataType::UInt64).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already wrote the match macro, it's ok to not change that :)

Let's add String|Null support!

}

fn return_type(&self, args: &[DataType]) -> Result<DataType> {
if !matches!(args[0], DataType::String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataType::Null can be supported.

select repeat(s, 0) from strings_repeat_sample_3;
select repeat(s, -5) from strings_repeat_sample_3;

drop table strings_repeat_sample_3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline needed.

for n in $times {
let value = n.and_then(|n| {
if *n >= 0 {
Some($input_string.repeat(*n as usize))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding max_repeat_times check. I don't want this to be an oom killer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependabot bot and others added 2 commits November 26, 2021 09:50
Bumps [crc32fast](https://github.com/srijs/rust-crc32fast) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/srijs/rust-crc32fast/releases)
- [Commits](srijs/rust-crc32fast@v1.2.1...v1.2.2)

---
updated-dependencies:
- dependency-name: crc32fast
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Add helper methods `DataType::is_integer`, `DataType::is_signed_integer`, `DataType::is_unsigned_integer`, `DataValue::is_integer`, `DataValue::is_signed_integer` and `DataValue::is_unsigned_integer`.
@sunli829
Copy link
Contributor Author

done. 🙂

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@databend-bot
Copy link
Member

Wait for another reviewer approval

@sundy-li sundy-li merged commit 71fe902 into databendlabs:main Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
4 participants