Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust line detection in highlight.go #20495

Closed
wants to merge 1 commit into from

Conversation

zeripath
Copy link
Contributor

The code for detection of lines in highlight.go is somewhat too complex
and doesn't take account of how chroma is actually splitting things into
lines for us.

Replace #19967

Signed-off-by: Andrew Thornton art27@cantab.net

The code for detection of lines in highlight.go is somewhat too complex
and doesn't take account of how chroma is actually splitting things into
lines for us.

Replace go-gitea#19967

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath added this to the 1.18.0 milestone Jul 26, 2022
@zeripath
Copy link
Contributor Author

@silverwind as I don't quite understand what #19967 is supposed to be doing or fixing I hope this is doing the right thing

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 26, 2022
@silverwind
Copy link
Member

Original problem was #19331, but the PR was then rewritten way beyond the original fix.

@silverwind
Copy link
Member

Can confirm this fixes #19331.

Copy link
Member

@silverwind silverwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Fixes all copy/paste issues and eliminates those useless wrappers of chroma as well while being far simpler than that other PR, I like it.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jul 26, 2022
@wxiaoguang
Copy link
Contributor

Some thoughts (but not blocker):

  1. I do not think the numLines is fine here (although the result seems fine, but it comes from a suspicious function linesBytesCount)
  2. I do not think the test code is clear for the code to show what are the expected results for all cases. Tests should be descriptive IMO, and they should be read and maintained easily for developers.
  3. I do not think the append(m, "<span class=\"w\">\n</span>") is right, that's inconsistent from other "\n", see the demo "a=1\n\n", a blank line between lines doesn't have that wrapper.
func linesBytesCount(s []byte) int {
	nl := []byte{'\n'}
	n := bytes.Count(s, nl)
	if len(s) > 0 && !bytes.HasSuffix(s, nl) {
		n++
	}
	return n
}

func TestTemp(t *testing.T) {
	test := func(fileName string, content string) {
		b := []byte(content)
		str, _ := json.Marshal(content)
		fmt.Printf("for: %s\n%s\n----\n", str, strings.Join(File(linesBytesCount(b), fileName, "", b), "\n"))
	}
	test("a.py", "")
	test("a.py", "\n")
	test("a.py", "a=1")
	test("a.py", "a=1\n")
	test("a.py", "a=1\n\n")
}
for: ""

----
for: "\n"


<span class="w">
</span>
----
for: "a=1"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>
----
for: "a=1\n"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>

<span class="w">
</span>
----
for: "a=1\n\n"
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>



<span class="w">
</span>

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jul 27, 2022

ps: about "The code for detection of lines in highlight.go is somewhat too complex and doesn't take account of how chroma is actually splitting things into lines for us."

I think that's not true. That code is not that complex, the core code is only about 30 lines for a simple tag matcher. Most changes are related to other issues, including fixing suspicious linesBytesCount function (which has been discussed in #19967 (comment) ), making tests more clear, etc. The changes are in comments: #19967 (comment) and #19967 (comment)

And the simple tag matcher already take account the Chroma's splitting result, it doesn't output inconsistent "<span class=\"w\">\n</span>", all new lines outputed by #19967 are those generated by Chroma, consistently.


If you think that the simple tag matcher is not needed, the #19967 can also be rewritten to a simple Split+Trim (updated)

@zeripath
Copy link
Contributor Author

So numlines isn't actually as suspect as you might think it's just pre computing the count.

What's your worry with the function?


append(m, "<span class=\"w\">\n</span>") yup I'm not certain what this is supposed to be doing in the original code.

As I say I don't understand what the PR is trying to solve.

@silverwind said that they wanted the span class w to remain so I've left them in. If we don't want them then let's get them out but explain why.


The original code I am talking about is not the PR you wrote. I am talking about the code in that is currently there which is splitting on \n and then bodging up a fix to resplit the lines.

Chroma's formatter guarantees that code lines become <span class="line"><span class="cl">...<span class="w">\n</span></span></span>

Therefore splitting on \n is the wrong thing.


@wxiaoguang I am not attacking you or your original PR - I just think that it's too complicated to write a tag parser to split this stuff and I think the &#10; stuff is weird.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jul 27, 2022

So numlines isn't actually as suspect as you might think it's just pre computing the count.
What's your worry with the function?

See #19967 (comment) , and I do not think it's right to pass the uncertain line numbers into the func File() or func Plain(). These functions should handle the real line numbers.

Before, linesBytesCount was just for the UI display, I wrote the comment for it:the NumLines is only used for the display on the UI: "xxx lines", so it's not necessary to be used for the line parser.

I just think that it's too complicated to write a tag parser to split this stuff and I think the stuff is weird.

Now it's not, just switched back to the Split+Trim method, and, the &#10; is necessary to make everything clear. You can try to write some tests without it to see whether it's possible to define new lines clearly. If you really do not like it, it can still be switched back.

@wxiaoguang
Copy link
Contributor

One more thing about: the old linesBytesCount and numLines:

That's why the finalNewLine existed in the old code. linesBytesCount did something wrong, then finalNewLine tried to fix it.

So #19967 removed such dirty behavior. In the future, if users want to hide the last empty line or show a block emoji (🚫) for no-newline when rendering, the new code is easier to do so #19967 (comment)

@zeripath
Copy link
Contributor Author

OK, well it's clear that I don't understand the problem. So I'm going to close this PR.

@zeripath zeripath closed this Jul 27, 2022
@silverwind
Copy link
Member

Sad to see such a quality PR go to waste. BTW we don't have all to agree to land a PR, just two approvals are enough :)

@lunny lunny removed this from the 1.18.0 milestone Dec 20, 2022
@zeripath zeripath deleted the replace-19967 branch December 29, 2022 19:19
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 1 This PR needs approval from one additional maintainer to be merged. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants