Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:implement sql style 'ends_with' and 'instr' string function #8862

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Jan 14, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

instr: https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_instr
ends_with: Refer to the existing starts_with function

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 14, 2024
/// Returns true if string ends with suffix.
/// ends_with('alphabet', 'abet') = 't'
pub fn ends_with<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
let string_array = as_generic_string_array::<T>(&args[0])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Yes, I think the body of thus function could be reduced to something like

Suggested change
let string_array = as_generic_string_array::<T>(&args[0])?;
compute::comparison::ends_with(&args[0])

BuiltinScalarFunction::InitCap => Volatility::Immutable,
BuiltinScalarFunction::InStr => Volatility::Immutable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the function name we implement need to follow postgresql? If so, I will change the name of this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark has both position and instr and they are not quite the same signature:
https://spark.apache.org/docs/latest/api/sql/#position
https://spark.apache.org/docs/latest/api/sql/#instr

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR @zy-kkk -- it is very well tested and commented 🦾 .

It looks quite close to me. The only thing I think it needs prior to merging is to address the potential use of an arrow kernel rather than reimplementation for ends_with but otherwise it is ready to go 🚀

Thanks again

/// Returns true if string ends with suffix.
/// ends_with('alphabet', 'abet') = 't'
pub fn ends_with<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
let string_array = as_generic_string_array::<T>(&args[0])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Yes, I think the body of thus function could be reduced to something like

Suggested change
let string_array = as_generic_string_array::<T>(&args[0])?;
compute::comparison::ends_with(&args[0])

Boolean,
BooleanArray
);
test_function!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for testing with NULL

datafusion/physical-expr/src/functions.rs Show resolved Hide resolved
@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 22, 2024

Hello @alamb, first of all thank you for your review of my PR,
I have used the arrow comparison function for the ends_with function implementation, and by the way changed starts_with to this.
For the instr function, I added more tests, but for this function, its implementation in MySQL is: instr('Helloworld', 'world') = 6, and for postgresql, it is, position('world' in 'Helloworld') = 6, which one do you want me to use, please tell me, currently I still retain the MySQL design
Thanks again for your review

@alamb
Copy link
Contributor

alamb commented Jan 22, 2024

Hello @alamb, first of all thank you for your review of my PR, I have used the arrow comparison function for the ends_with function implementation, and by the way changed starts_with to this. For the instr function, I added more tests, but for this function, its implementation in MySQL is: instr('Helloworld', 'world') = 6, and for postgresql, it is, position('world' in 'Helloworld') = 6, which one do you want me to use, please tell me, currently I still retain the MySQL design Thanks again for your review

Thank you @zy-kkk -- I think we try to follow the postgres style, but making the function packages as compatible as possible is a good thing.

In this case, perhaps you can add an alias for instr so it can also be called with position: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html#method.aliases

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.

It would be nice to add a position alias, but also think we can do that as a follow on PR as well

_ => None,
})
.collect::<BooleanArray>();
let result = arrow::compute::kernels::comparison::starts_with(left, right)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 23, 2024

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.

It would be nice to add a position alias, but also think we can do that as a follow on PR as well

Thank you, @alamb, I will continue to modify this PR so that position('world' in 'Helloworld') = 6 can be run to be compatible with postgresql

@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 23, 2024

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.
It would be nice to add a position alias, but also think we can do that as a follow on PR as well

Thank you, @alamb, I will continue to modify this PR so that position('world' in 'Helloworld') = 6 can be run to be compatible with postgresql

Hi @alamb, I tested changing instr to position. If I want to support the syntax position('world' in 'Helloworld') = 6, I need to do it at https://github.com/sqlparser-rs/sqlparser-rs Add a syntax parsing in , which may be like adding the overlay function in apache/datafusion-sqlparser-rs#594. What do you suggest?

@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

If I want to support the syntax position('world' in 'Helloworld') = 6, I need to do it at https://github.com/sqlparser-rs/sqlparser-rs Add a syntax parsing in , which may be like adding the overlay function in apache/datafusion-sqlparser-rs#594. What do you suggest?

Hi @zy-kkk -- I didn't realize there was special syntax. Yes in order to support that special syntax we would need a change in sqlparser.

What I suggest in this case is that we file a follow on ticket to support posgresql style and merge this PR. I will file the ticket shortly

@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

I filed #8969 to track adding the position function

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @zy-kkk -- this is a really nice addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants