-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-2814: introducing distributed insertion #2945
Conversation
Thanks for the contribution! Please review the labels and make any necessary changes. |
Codecov Report
@@ Coverage Diff @@
## main #2945 +/- ##
======================================
Coverage 68% 68%
======================================
Files 651 656 +5
Lines 34223 34363 +140
======================================
+ Hits 23414 23529 +115
- Misses 10809 10834 +25
Continue to review full report at Codecov.
|
@@ -215,3 +217,29 @@ pub fn merge_stats(schema: &DataSchema, l: &Stats, r: &Stats) -> Result<Stats> { | |||
}; | |||
Ok(s) | |||
} | |||
|
|||
pub fn merge_appends( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm, it seems it's time to move this helper to a folder named like statistics, 'util' seems not a good name for others to explore the codes, it would be nice to separate them to the directories :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree ,there is a pending pr #2408 ,shall i merge it into this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, pls do not bother reviewing the pr #2480, forgot that the pr accumulated lots of irrelevant changes. Let me do the refactoring in this pr directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Sorry I missed there is a refine issue #2408 already, we can do the refactoring in that PR if you don't feel right here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
btw, although ut and stateless tests passed, there are still something known issues . Trying to fix them |
/lgtm |
Wait for another reviewer approval |
Merge first, some comments also welcome... |
@dantengsky |
I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/
Summary
issue Proposal: supports parallel insertion #2914
Table::append
been split intoappend
andcommit
append
returns a table specificSendableDataBlockStream
for fuse table, a
log
of append operations were encoded into the stream; for other writable tables, returns an empty streamcommit
takes the stream returned by append, and interpreter it in a table-specific wayfor fuse table, it would be a stream of segment information, upon which, the table will try to generate a new snapshot and commit to meta-server.
for other writable tables, the stream will be ignored (and returns ok)
InsertIntoInterpreter
is in charge of feeding upstream values intoTable::append
and putting what append returned intoTable::commit
If applicable, a distributed append schedule will be utilized
e.g. statements like
insert into t1 select * from numbers(10000)
, append will be done distributedly(if in cluster mode), but committed by one query node.but for stmts like
insert into n1 select sum(number) from numbers(1000) group by number %3
, the append will still be done inside one query node.the main idea of planning goes like this:
but 1) since planning is rather sophisticated, if there were cases that the above arrangement can not be covered, please let me know. 2) currently, the above logic is embedded in
InsertIntoInterpreter
, if somewhere else is preferred, pls let me knowthe following cases have been tested in
standalone
andcluster-3-nodes
setups:misc.
about the
ExpressionTransform
that @sundy-li kindly suggestedin this PR, no explicit expression transform has been arranged, a CastStream is enabled if necessary inside the SinkTransform, hope this is doing the same thing.
@zhang2014
SelectInterpreter
has been slightly refactored sinceInsertIntoInterpreter
reused lots of code from itsome common stuff has been extracted to
plan_scheduler_ext.rs
(any better name?)@BohuTANG refactoring of
fuse::util
has been postponed ( to a dedicated PR)so that this PR might be easier to be reviewed
explain pipeline insert ...
does not work yetChangelog
Related Issues
Fixes #2914
Test Plan
Unit Tests
Stateless Tests