Adjust line detection in highlight.go #20495

zeripath · 2022-07-26T21:04:51Z

The code for detection of lines in highlight.go is somewhat too complex
and doesn't take account of how chroma is actually splitting things into
lines for us.

Replace #19967

Signed-off-by: Andrew Thornton art27@cantab.net

The code for detection of lines in highlight.go is somewhat too complex and doesn't take account of how chroma is actually splitting things into lines for us. Replace go-gitea#19967 Signed-off-by: Andrew Thornton <art27@cantab.net>

zeripath · 2022-07-26T21:05:31Z

@silverwind as I don't quite understand what #19967 is supposed to be doing or fixing I hope this is doing the right thing

silverwind · 2022-07-26T21:23:23Z

Original problem was #19331, but the PR was then rewritten way beyond the original fix.

silverwind · 2022-07-26T21:26:17Z

Can confirm this fixes #19331.

silverwind

This looks great. Fixes all copy/paste issues and eliminates those useless wrappers of chroma as well while being far simpler than that other PR, I like it.

wxiaoguang · 2022-07-27T02:41:23Z

Some thoughts (but not blocker):

I do not think the numLines is fine here (although the result seems fine, but it comes from a suspicious function linesBytesCount)
I do not think the test code is clear for the code to show what are the expected results for all cases. Tests should be descriptive IMO, and they should be read and maintained easily for developers.
I do not think the append(m, "\n") is right, that's inconsistent from other "\n", see the demo "a=1\n\n", a blank line between lines doesn't have that wrapper.

func linesBytesCount(s []byte) int {
	nl := []byte{'\n'}
	n := bytes.Count(s, nl)
	if len(s) > 0 && !bytes.HasSuffix(s, nl) {
		n++
	}
	return n
}

func TestTemp(t *testing.T) {
	test := func(fileName string, content string) {
		b := []byte(content)
		str, _ := json.Marshal(content)
		fmt.Printf("for: %s\n%s\n----\n", str, strings.Join(File(linesBytesCount(b), fileName, "", b), "\n"))
	}
	test("a.py", "")
	test("a.py", "\n")
	test("a.py", "a=1")
	test("a.py", "a=1\n")
	test("a.py", "a=1\n\n")
}

for: ""

----
for: "\n"


<span class="w">
</span>
----
for: "a=1"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>
----
for: "a=1\n"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>

<span class="w">
</span>
----
for: "a=1\n\n"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>



<span class="w">
</span>

wxiaoguang · 2022-07-27T02:53:40Z

ps: about "The code for detection of lines in highlight.go is somewhat too complex and doesn't take account of how chroma is actually splitting things into lines for us."

I think that's not true. That code is not that complex, the core code is only about 30 lines for a simple tag matcher. Most changes are related to other issues, including fixing suspicious linesBytesCount function (which has been discussed in #19967 (comment) ), making tests more clear, etc. The changes are in comments: #19967 (comment) and #19967 (comment)

And the simple tag matcher already take account the Chroma's splitting result, it doesn't output inconsistent "\n", all new lines outputed by #19967 are those generated by Chroma, consistently.

If you think that the simple tag matcher is not needed, the #19967 can also be rewritten to a simple Split+Trim (updated)

zeripath · 2022-07-27T06:33:25Z

So numlines isn't actually as suspect as you might think it's just pre computing the count.

What's your worry with the function?

append(m, "\n") yup I'm not certain what this is supposed to be doing in the original code.

As I say I don't understand what the PR is trying to solve.

@silverwind said that they wanted the span class w to remain so I've left them in. If we don't want them then let's get them out but explain why.

The original code I am talking about is not the PR you wrote. I am talking about the code in that is currently there which is splitting on \n and then bodging up a fix to resplit the lines.

Chroma's formatter guarantees that code lines become ...\n

Therefore splitting on \n is the wrong thing.

@wxiaoguang I am not attacking you or your original PR - I just think that it's too complicated to write a tag parser to split this stuff and I think the 
 stuff is weird.

wxiaoguang · 2022-07-27T06:40:02Z

So numlines isn't actually as suspect as you might think it's just pre computing the count.
What's your worry with the function?

See #19967 (comment) , and I do not think it's right to pass the uncertain line numbers into the func File() or func Plain(). These functions should handle the real line numbers.

Before, linesBytesCount was just for the UI display, I wrote the comment for it:the NumLines is only used for the display on the UI: "xxx lines", so it's not necessary to be used for the line parser.

I just think that it's too complicated to write a tag parser to split this stuff and I think the stuff is weird.

Now it's not, just switched back to the Split+Trim method, and, the 
 is necessary to make everything clear. You can try to write some tests without it to see whether it's possible to define new lines clearly. If you really do not like it, it can still be switched back.

wxiaoguang · 2022-07-27T06:46:33Z

One more thing about: the old linesBytesCount and numLines:

That's why the finalNewLine existed in the old code. linesBytesCount did something wrong, then finalNewLine tried to fix it.

So #19967 removed such dirty behavior. In the future, if users want to hide the last empty line or show a block emoji (🚫) for no-newline when rendering, the new code is easier to do so #19967 (comment)

zeripath · 2022-07-27T17:49:01Z

OK, well it's clear that I don't understand the problem. So I'm going to close this PR.

silverwind · 2022-07-27T18:19:37Z

Sad to see such a quality PR go to waste. BTW we don't have all to agree to land a PR, just two approvals are enough :)

Adjust line detection in highlight.go

3ccff16

The code for detection of lines in highlight.go is somewhat too complex and doesn't take account of how chroma is actually splitting things into lines for us. Replace go-gitea#19967 Signed-off-by: Andrew Thornton <art27@cantab.net>

zeripath added the type/bug label Jul 26, 2022

zeripath added this to the 1.18.0 milestone Jul 26, 2022

GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 26, 2022

silverwind approved these changes Jul 26, 2022

View reviewed changes

GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jul 26, 2022

zeripath closed this Jul 27, 2022

lunny removed this from the 1.18.0 milestone Dec 20, 2022

zeripath deleted the replace-19967 branch December 29, 2022 19:19

go-gitea locked and limited conversation to collaborators May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust line detection in highlight.go #20495

Adjust line detection in highlight.go #20495

zeripath commented Jul 26, 2022

zeripath commented Jul 26, 2022

silverwind commented Jul 26, 2022

silverwind commented Jul 26, 2022

silverwind left a comment •

edited

Loading

wxiaoguang commented Jul 27, 2022

wxiaoguang commented Jul 27, 2022 •

edited

Loading

zeripath commented Jul 27, 2022

wxiaoguang commented Jul 27, 2022 •

edited

Loading

wxiaoguang commented Jul 27, 2022

zeripath commented Jul 27, 2022

silverwind commented Jul 27, 2022

Adjust line detection in highlight.go #20495

Adjust line detection in highlight.go #20495

Conversation

zeripath commented Jul 26, 2022

zeripath commented Jul 26, 2022

silverwind commented Jul 26, 2022

silverwind commented Jul 26, 2022

silverwind left a comment • edited Loading

Choose a reason for hiding this comment

wxiaoguang commented Jul 27, 2022

wxiaoguang commented Jul 27, 2022 • edited Loading

zeripath commented Jul 27, 2022

wxiaoguang commented Jul 27, 2022 • edited Loading

wxiaoguang commented Jul 27, 2022

zeripath commented Jul 27, 2022

silverwind commented Jul 27, 2022

silverwind left a comment •

edited

Loading

wxiaoguang commented Jul 27, 2022 •

edited

Loading

wxiaoguang commented Jul 27, 2022 •

edited

Loading