-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify windows builtin functions return type #8920
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -86,6 +86,7 @@ use datafusion_expr::expr::{ | |
}; | ||
use datafusion_expr::expr_rewriter::unnormalize_cols; | ||
use datafusion_expr::logical_plan::builder::wrap_projection_for_join_if_necessary; | ||
use datafusion_expr::utils::exprlist_to_fields; | ||
use datafusion_expr::{ | ||
DescribeTable, DmlStatement, ScalarFunctionDefinition, StringifiedPlan, WindowFrame, | ||
WindowFrameBound, WriteOp, | ||
|
@@ -719,14 +720,16 @@ impl DefaultPhysicalPlanner { | |
} | ||
|
||
let logical_input_schema = input.schema(); | ||
let physical_input_schema = input_exec.schema(); | ||
// Extend the schema to include window expression fields as builtin window functions derives its datatype from incoming schema | ||
let mut window_fields = logical_input_schema.fields().clone(); | ||
window_fields.extend_from_slice(&exprlist_to_fields(window_expr.iter(), input)?); | ||
let extended_schema = &DFSchema::new_with_metadata(window_fields, HashMap::new())?; | ||
let window_expr = window_expr | ||
.iter() | ||
.map(|e| { | ||
create_window_expr( | ||
e, | ||
logical_input_schema, | ||
&physical_input_schema, | ||
extended_schema, | ||
Comment on lines
+724
to
+732
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you need to recompute this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filed #8955 |
||
session_state.execution_props(), | ||
) | ||
}) | ||
|
@@ -1526,7 +1529,7 @@ fn get_physical_expr_pair( | |
/// queries like: | ||
/// OVER (ORDER BY a RANGES BETWEEN 3 PRECEDING AND 5 PRECEDING) | ||
/// OVER (ORDER BY a RANGES BETWEEN INTERVAL '3 DAY' PRECEDING AND '5 DAY' PRECEDING) are rejected | ||
pub fn is_window_valid(window_frame: &WindowFrame) -> bool { | ||
pub fn is_window_frame_bound_valid(window_frame: &WindowFrame) -> bool { | ||
match (&window_frame.start_bound, &window_frame.end_bound) { | ||
(WindowFrameBound::Following(_), WindowFrameBound::Preceding(_)) | ||
| (WindowFrameBound::Following(_), WindowFrameBound::CurrentRow) | ||
|
@@ -1546,10 +1549,10 @@ pub fn create_window_expr_with_name( | |
e: &Expr, | ||
name: impl Into<String>, | ||
logical_input_schema: &DFSchema, | ||
physical_input_schema: &Schema, | ||
execution_props: &ExecutionProps, | ||
) -> Result<Arc<dyn WindowExpr>> { | ||
let name = name.into(); | ||
let physical_input_schema: &Schema = &logical_input_schema.into(); | ||
match e { | ||
Expr::WindowFunction(WindowFunction { | ||
fun, | ||
|
@@ -1572,7 +1575,8 @@ pub fn create_window_expr_with_name( | |
create_physical_sort_expr(e, logical_input_schema, execution_props) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, this looks incorrect as you use window's schema as input schema. Although window's schema is input schema + window functions output, it is why this change still makes thing work. But it is actually misleading for readers and probably cause of potential bugs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This may be what @mustafasrepo has improved in #8920 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exactly, I re-introduced the invariant of using only input schema |
||
}) | ||
.collect::<Result<Vec<_>>>()?; | ||
if !is_window_valid(window_frame) { | ||
|
||
if !is_window_frame_bound_valid(window_frame) { | ||
return plan_err!( | ||
"Invalid window frame: start bound ({}) cannot be larger than end bound ({})", | ||
window_frame.start_bound, window_frame.end_bound | ||
|
@@ -1598,21 +1602,14 @@ pub fn create_window_expr_with_name( | |
pub fn create_window_expr( | ||
e: &Expr, | ||
logical_input_schema: &DFSchema, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is misleading, it is actually window's schema, not input schema now. |
||
physical_input_schema: &Schema, | ||
execution_props: &ExecutionProps, | ||
) -> Result<Arc<dyn WindowExpr>> { | ||
// unpack aliased logical expressions, e.g. "sum(col) over () as total" | ||
let (name, e) = match e { | ||
Expr::Alias(Alias { expr, name, .. }) => (name.clone(), expr.as_ref()), | ||
_ => (e.display_name()?, e), | ||
}; | ||
create_window_expr_with_name( | ||
e, | ||
name, | ||
logical_input_schema, | ||
physical_input_schema, | ||
execution_props, | ||
) | ||
create_window_expr_with_name(e, name, logical_input_schema, execution_props) | ||
} | ||
|
||
type AggregateExprWithOptionalArgs = ( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't
input.schema()
reflect all the columns that the input produces?Or does the
WindowAggExec
create new columns "internally" by evaluating the window expressions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input schema is schema from previous plan node(no window expressions). afaik windows expression column being added separately
https://github.com/apache/arrow-datafusion/blob/0116e2a9b4a3ed4491802e19195769b96b7a971a/datafusion/expr/src/logical_plan/plan.rs#L2045