Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty method definition with only a comment seems to confuse syntaxsuggest #177

Closed
mame opened this issue Mar 6, 2023 · 1 comment
Closed

Comments

@mame
Copy link
Member

mame commented Mar 6, 2023

class C
  def foo
    # comment
  end

  def bar
    "some literal"
  end

  def baz
  end

  def qux
  end

  def quux
  end
end
end # extra end

Expected:

$ ruby error.rb 
error.rb: --> error.rb
Unmatched `end', missing keyword (`do', `def`, `if`, etc.) ?
>  1  class C
> 18  end
> 19  end # extra end
error.rb:19: syntax error, unexpected `end' (SyntaxError)
end # extra end
^~~

Actual:

$ ruby error.rb 
error.rb: --> error.rb
Unmatched `end', missing keyword (`do', `def`, `if`, etc.) ?
   1  class C
>  4    end
> 10    def baz
> 11    end
  18  end
error.rb:19: syntax error, unexpected `end' (SyntaxError)
end # extra end
^~~

I haven't identified the exact condition to reproduce this issue, but I am wondering that an empty method definition with only comments would confuse the heuristics.

@schneems
Copy link
Collaborator

schneems commented Mar 7, 2023

Very interesting. Thanks for the report. I do some special stuff to handle comments

# ## Comments and whitespace
#
# Comments can throw off the way the lexer tells us that the line
# logically belongs with the next line. This is valid ruby but
# results in a different lex output than before:
#
# 1 User.
# 2 where(name: "schneems").
# 3 # Comment here
# 4 first
#
# To handle this we can replace comment lines with empty lines
# and then re-lex the source. This removal and re-lexing preserves
# line index and document size, but generates an easier to work with
# document.
#
. So they're actually stripped out when the document is re-lexxed. This doc is equivalent to:

class C
  def foo

  end

  def bar
    "some literal"
  end

  def baz
  end

  def qux
  end

  def quux
  end
end
end # extra end

If you want to see individual steps you can run it with this flag:

$ SYNTAX_SUGGEST_DEBUG=1 ruby scratch.rb

The first few steps aren't that interesting, but step 5 is:

    Block lines: 5..9 (expand) 

   1  class C
   2    def foo
   3
   4    end
>  5
>  9
  10    def baz
  11    end
  12
  13    def qux
  14    end
  15
  16    def quux
  17    end
  18  end
  19  end # extra end

On the next expansion it pops lines 5-9 and expands in both directions them to 3-12:

    Block lines: 3..12 (expand) 

   1  class C
   2    def foo
>  3
>  4    end
> 10    def baz
> 11    end
> 12
  13    def qux
  14    end
  15
  16    def quux
  17    end
  18  end
  19  end # extra end

So the issue looks like it stops that expansion up before it should.

expanded_lines = AroundBlockScan.new(code_lines: @code_lines, block: block)
.skip(:hidden?)
.stop_after_kw
.scan_neighbors
.scan_while { |line| line.empty? } # Slurp up empties
.lines

I'll have to play with it some. Either we can change the input document (hiding comment lines for example) or we can change some logic in that scanner. Usually fixing one problem breaks an existing case, but often there's some good path forward.

schneems added a commit that referenced this issue Mar 8, 2023
Close #177
schneems added a commit that referenced this issue Mar 9, 2023
While #177 is reported as being caused by a comment, the underlying behavior is a problem due to the newline that we generated (from a comment). The prior commit fixed that problem by preserving whitespace before the comment. That guarantees that a block will form there from the frontier before it will be expanded there via a "neighbors" method. Since empty lines are valid ruby code, it will be hidden and be safe.

## Problem setup

This failure mode is not fixed by the prior commit, because the indentation is 0. To provide good results, we must make the algorithm less greedy. One heuristic/signal to follow is developer added newlines. If a developer puts a newline between code, it's more likely they're unrelated. For example:

```
port = rand(1000...9999)
stub_request(:any, "localhost:#{port}")

query = Cutlass::FunctionQuery.new(
  port: port
).call

expect(WebMock).to have_requested(:post, "localhost:#{port}").
  with(body: "{}")
```

This code is split into three chunks by the developer. Each are likely (but not guaranteed) to be intended to stand on their own (in terms of syntax). This behavior is good for scanning neighbors (same indent or higher) within a method, but bad for parsing neighbors across methods.

## Problem

Code is expanded to capture all neighbors, and then it decreases indent level which allows it to capture surrounding scope (think moving from within the method to also capturing the `def/end` definition. Once the indentation level has been increased, we go back to scanning neighbors, but now neighbors also contain keywords.

For example:

```
  1 def bark
  2
  3 end
  4
  5 def sit
  6 end
```

In this case if lines 4, 5, and 6 are in a block when it tries to expand neighbors it will expand up. If it stops after line 2 or 3 it may cause problems since there's a valid kw/end pair, but the block will be checked without it.

TLDR; It's good to stop scanning code after hitting a newline when you're in a method...it causes a problem scanning code between methods when everything inside of one of the methods is an empty line.

In this case it grabs the end on line 3 and since the problem was an extra end, the program now compiles correctly. It incorrectly assumes that the block it captured was causing the problem.

## Extra bit of context

One other technical detail is that after we've decided to stop scanning code for a new neighbor block expansion, we look around the block and grab any empty newlines. Basically adding empty newlines before of after a code block do not affect the parsing of that block.

## The fix

Since we know that this problem only happens when there's a newline inside of a method and we know this particular failure mode is due to having an invalid block (capturing an extra end, but not it's keyword) we have all the metadata we need to detect this scenario and correct it.

We know that the next line above our block must be code or empty (since we grabbed extra newlines). Same for code below it. We can count all the keywords and ends in the block. If they are balanced, it's likely (but not guaranteed) we formed the block correctly. If they're imbalanced, look above or below (depending on the nature of the imbalance), check to see if adding that line would balance the count.

This concept of balance and "leaning" comes from work in #152 and has proven useful, but not been formally introduced into the main branch.

## Outcome

Adding this extra check introduced no regressions and fixed the test case. It might be possible there's a mirror or similar problem that we're not handling. That will come out in time. It might also be possible that this causes a worse case in some code not under test. That too would come out in time.

One other possible concern to adding logic in this area (which is a hot codepath), is performance. This extra count check will be performed for every block. In general the two most helpful performance strategies I've found are reducing total number of blocks (therefore reducing overall N internal iterations) and making better matches (the parser to determine if a close block is valid or not is a major bottleneck. If we can split valid code into valid blocks, then it's only evaluated by the parser once, where as invalid code must be continuously re-checked by the parser until it becomes valid, or is determined to be the cause of the core problem.

This extra logic should very rarely result in a change, but when it does it should tend to produce slightly larger blocks (by one line) and more accurate blocks.

Informally it seems to have no impact on performance:

``
This branch:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures  3.01s user 1.62s system 113% cpu 4.076 total
```

```
On main:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures  3.02s user 1.64s system 113% cpu 4.098 total
```
schneems added a commit that referenced this issue Mar 9, 2023
Originally I fixed #177 by making the process of comment removal indentation aware. The next commit is the more general fix and means we don't need to carry that additional logic/overhead.

Also: Update syntax via linter
schneems added a commit that referenced this issue Mar 9, 2023
Originally I fixed #177 by making the process of comment removal indentation aware. The next commit is the more general fix and means we don't need to carry that additional logic/overhead.

Also: Update syntax via linter
schneems added a commit that referenced this issue Mar 9, 2023
Originally I fixed #177 by making the process of comment removal indentation aware. The next commit is the more general fix and means we don't need to carry that additional logic/overhead.

Also: Update syntax via linter
matzbot pushed a commit to ruby/ruby that referenced this issue Apr 6, 2023
When removing comments I previously replaced them with a newline. This loses some context and may affect the order of the indent search which in turn affects the final result. By preserving whitespace in front of the comment, we preserve the "natural" indentation order of the line while also allowing the parser/lexer to see and join naturally consecutive (method chain) lines.

close ruby/syntax_suggest#177
matzbot pushed a commit to ruby/ruby that referenced this issue Apr 6, 2023
Originally I fixed ruby/syntax_suggest#177 by making the process of comment removal indentation aware. The next commit is the more general fix and means we don't need to carry that additional logic/overhead.

Also: Update syntax via linter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants