Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive MySQL Parser #157

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
ccc341b
MySQL AST Parser
adamziel Aug 17, 2024
78fdf69
Fix parser overriding parts of the parse tree as it constructs them.
adamziel Aug 17, 2024
c8652d5
Output ParseTree using a class, not an array for much simpler processing
adamziel Aug 17, 2024
137d6ca
Manually factor left recursion into right recursion in the grammar fi…
adamziel Aug 18, 2024
0a2440c
Explore support for SQL_CALC_FOUND_ROWS
adamziel Aug 20, 2024
0406d71
Support VALUES() call
adamziel Aug 20, 2024
87573f2
Extract queries from MySQL test suite and test the parser against them
JanJakes Sep 26, 2024
9629702
Implement handling for manually added lexer symbols
JanJakes Sep 26, 2024
d63bc6e
Fix passing nulls to "ctype_" functions
JanJakes Sep 26, 2024
8e7e2e8
Add support for hex format x'ab12', X'ab12', and bin format x'01' and…
JanJakes Sep 26, 2024
ebcc17e
Fix wrong MySQL version conditions (AI hallucinations)
JanJakes Sep 26, 2024
1551b0e
Implement the checkCharset() placeholder function
JanJakes Sep 26, 2024
cdd84b4
Document manual grammar factoring
JanJakes Sep 26, 2024
f50b515
Fix "alterOrderList" that has a wrong definition in the original grammar
JanJakes Sep 26, 2024
e267f67
Fix "createUser" that was incorrectly converted from ANTLR to EBNF
JanJakes Sep 26, 2024
cd543af
Fix "castType" that was incomplete in the original grammar
JanJakes Sep 26, 2024
135f29f
Fix "SELECT ... WHERE ... INTO @var" using a negative lookahead
JanJakes Sep 26, 2024
27524dd
Fix "EXPLAIN FORMAT=..." by reordering grammar rules
JanJakes Sep 27, 2024
069342f
Fix special "WINDOW" and "OVER" cases by adjusting grammar rules
JanJakes Sep 27, 2024
9bfc977
Fix "GRANT" and "REVOKE" by adjusting grammar rules to solve conflicts
JanJakes Sep 27, 2024
ca4de77
Use ebnfutils to dump grammar conflicts
JanJakes Sep 27, 2024
cd3504d
Implement the determineFunction() placeholder function, unify SQL modes
JanJakes Sep 30, 2024
81bbde0
Fix processing NOW() synonyms in lexer
JanJakes Sep 30, 2024
71292fb
Match mysqltest commands case-insensitively
JanJakes Sep 30, 2024
1ab3723
Add a script to test lexer on all the testing queries
JanJakes Oct 1, 2024
42ffc1b
Replace lexer switch/case and function calls with lookup tables
JanJakes Oct 1, 2024
01241b8
Fix unicode handling when extracting test queries
JanJakes Oct 2, 2024
f5266f1
Fix identifier matching, improve lexer performance by ~25%
JanJakes Oct 2, 2024
a2ac60b
Unify charset matching with identifier matching, remove non-existent …
JanJakes Oct 2, 2024
90e2af6
Fix quoted text/identifier matching, improve lexer performance by ~10%
JanJakes Oct 2, 2024
dfcad3e
Remove dependency on ctype, improve lexer performance by ~4%
JanJakes Oct 3, 2024
0f23dd3
Determine token names lazily, reduce LOC to < 3000, improve lexer per…
JanJakes Oct 3, 2024
6bb8ff9
Finish manual token pass, use MySQL Workbench token IDs, add comments
JanJakes Oct 3, 2024
ec55c10
Fix wrong token type
JanJakes Oct 3, 2024
21f3f16
Inline some simple single-use methods, remove unused methods
JanJakes Oct 3, 2024
b4f0e08
Move token iteration loop outside nextToken() method
JanJakes Oct 3, 2024
f658751
Fix and simplify lookahead logic, improve lexer performance by ~6%
JanJakes Oct 3, 2024
4eee991
Fix date and time literals by reordering grammar rules
JanJakes Oct 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions custom-parser/parser/DynamicRecursiveDescentParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -422,6 +422,16 @@ private function parse_recursive($rule_id) {
$node->append_child($subnode);
}
}

// Negative lookahead for INTO after a valid SELECT statement.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comments!

// If we match a SELECT statement, but there is an INTO keyword after it,
// we're in the wrong branch and need to leave matching to a later rule.
// For now, it's hard-coded, but we could extract it to a lookahead table.
$la = $this->tokens[$this->position] ?? null;
if ($la && $rule_name === 'selectStatement' && $la->type === MySQLLexer::INTO_SYMBOL) {
$branch_matches = false;
}

if ($branch_matches === true) {
break;
}
Expand Down
Loading
Loading