Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make fields of ScalarUDF non pub #8039

Closed
wants to merge 1 commit into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 2, 2023

Which issue does this PR close?

Part of #7110 and #7977

Rationale for this change

Over time, the number of built in functions within DataFusion grows which is challenging both because:

  1. The total footprint grows, even for those who don't need the specific functions
  2. The desired semantics may be different (e.g. many of the built in functions in DataFusion mirror postgres behavior, but some users wish to mimic spark behavior
  3. User defined functions are treated differently from built in functions in some ways (e.g. they can't have aliases)

If is for this reason I would like to move towards only having functions defined as ScalarUDFs This will ensure:

  1. ScalarUDFs have access to all the same functionality as "built in " functions
  2. No function specific code will escape the planning

What changes are included in this PR?

  1. Make all fields of ScalarUDF private and add accessors.

Then we can add additional fields / trait impls as proposed by @2010YOUY01

Are these changes tested?

Covered by existing tests

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Nov 2, 2023
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[derive(Clone)]
pub struct ScalarUDF {
/// name
pub name: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key change in this PR is to remove pub

&self.signature
}
/// return the return type of this function given the types of the arguments
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than expose the underlying function pointer directly, I opted to handle the nuance of calling it here.

@alamb alamb changed the title Make fields of ScalarUDF non pub RFC: Make fields of ScalarUDF non pub Nov 2, 2023
@alamb alamb mentioned this pull request Nov 2, 2023
2 tasks
@2010YOUY01
Copy link
Contributor

Thank you, I think it's a good idea.

Now for scalar functions there are two execution paths: BuitlinScalarFunction and ScalarUDF, this way we are moving towards using ScalarUDF as the unified path.

What do you think is a good implementation plan for the next step?
I was thinking about making a converter from BuiltinScalarFunction -> ScalarUDF first, and let all BuiltinScalarFunctions use ScalarUDF as the execution path. After that ScalarUDF should be extended to cover full functionalities, then the migrating step will be much easier

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2023

What do you think is a good implementation plan for the next step?

I was thinking one next step might be to try and port a few BuiltInFunctions to be ScalarUDFs (maybe the string functions or the crypto functions like MD5) -- specifically remove them from the BuiltInFunction enum and make sure all the tests still pass

I think that would force is to figure out How to register these functions with the FunctionRegistry by default, and other potential touchpoints there might be

I will try and write up some ideas later todya

@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2023

Productionized in #8079

@alamb alamb closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants