Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Indexing into an empty node #132

Closed
Marwes opened this issue Apr 24, 2020 · 7 comments · Fixed by #139
Closed

Error: Indexing into an empty node #132

Marwes opened this issue Apr 24, 2020 · 7 comments · Fixed by #139
Labels
bug Something isn't working

Comments

@Marwes
Copy link
Contributor

Marwes commented Apr 24, 2020

Attempt to skip over /* */ comments but I am getting an error which seems to be a bug. Similar to #97 I think as I were getting that error before reducing it.

#[derive(logos::Logos)]
pub enum Token {
    #[error]
    #[regex(r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/", logos::skip)]
    Error,
}
@Marwes
Copy link
Contributor Author

Marwes commented Apr 24, 2020

#[derive(logos::Logos)]
pub enum Token2 {
    #[regex(r#"[!#$%&*+-./<=>?@\\^|~:]+"#)]
    Operator,

    #[error]
    #[regex(r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/", logos::skip)]
    Error,
}

Gives Merging two reserved nodes!

@maciejhirsz maciejhirsz added the bug Something isn't working label Apr 25, 2020
@Evrey
Copy link

Evrey commented Apr 25, 2020

Your regex is very strange.

(?x:
    # (1) This entire block
    (   [^\*]*
        \*+
        [^\*/]
    )*
    # (1)  is almost the exact same as this block. And both are optional.
    (   [^\*]*
        \*+     # (2) Why `+ |` when you can do `*`?
    |   [^\*]
    )*
)

I think what you wanted was this? (?x: [^\*] | (\* [^/]?) ).

And in your Token2 example, your regexes clash. Logos sadly can't roll back to commit multiple tokens. So when you were to write a /* b, is that the start of a block comment now, or is it a custom operator? Logos needs to find the final */ to decide. Now imagine the full code looks like this: a /* b + c */. If the final */ didn't exist, Logos would have to go back and commit three identifier and two operator tokens.

So the »bug« for Token2 is that the error message should say: »Thine token rules are too powerful for me.«

You can solve this in two ways, an elegant and a hack-y way:

  • The elegant solution is to not have comment delimiters overlap with other syntax elements. This is of course not an option if you are parsing an already existing proglang. Edit: For example, comments start with \#\x20 for me. No custom operator can have # anywhere inside.
  • The hack-y solution is to make a #[token("/*")] MaybeBlockCommentStart, and then branch off to a sub-lexer via Lexer::morph and let it either find the end of a block comment or revert all sub-lexing and commit an operator. This, of course, is costly infinite look-ahead.
  • Edit: Well, there's a third way I forgot about. Explicitly disallow /* as an operator. For example, in my own lexer, : is a special token, so only :: may appear in an operator. ::= is a valid custom operator, := is not.

@maciejhirsz
Copy link
Owner

I should have figured out this is going to be a problem. There was an issue before when the first thing inside a grouped repeat was a maybe: (f?oo)*. I fixed that one, but haven't thought about the same issue with loops that begin with a loop: (f*oo)*. I should have a fix shortly.

@Marwes
Copy link
Contributor Author

Marwes commented Apr 25, 2020

I think I may have modified the regex from this answer https://stackoverflow.com/a/36328890/2489366 so that it didn't need non-greedy matching since that caused ambiguities with LALRPOP. Then I just copied that into logos when adding comments. There may very well be a better way to write it in logos.

The error is still poor, as noted so I didn't actually know what I were supposed to do to fix it :)

@maciejhirsz
Copy link
Owner

The error is still poor, as noted so I didn't actually know what I were supposed to do to fix it :)

Any panic in logos-derive is a bug at this point, I'm fixing it :)

@maciejhirsz
Copy link
Owner

maciejhirsz commented Apr 25, 2020

I should have a fix shortly.

Famous last words. So I fixed the issue I thought this was relatively quickly, but the example you gave me will require a bit more digging.

The operator and comment regexes overlap, and they are both looping. I've handled this case elsewhere quite gracefully, but Logos really doesn't like this example. That it panics is bad enough, but it panics after a long time, which means there is some crazy looping going on when merging the regexes.

I'll make a release with the fix I have and #131, and get back to tracking this.

Edit: shortly = almost two days ;P

@maciejhirsz
Copy link
Owner

Released a fix in 0.11.3 (logos-derive 0.11.5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants