Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched lines #352

Open
GitsHubber opened this issue Mar 29, 2023 · 19 comments
Open

Mismatched lines #352

GitsHubber opened this issue Mar 29, 2023 · 19 comments
Labels

Comments

@GitsHubber
Copy link

Lines from one document seem to be matched/compared to lines from another that they shouldn't compared to. That is to say that there are lines which are a far better/closer match. Also sometimes they are matched/compared to lines which are not much of a match, and infact have no equivalent line in the other document.

Perhaps better explained with a screenshot example.

https://images2.imgbox.com/30/7e/3tNRGzNT_o.png

@thereddestdog
Copy link

I think this is also related.. just a simple text compare:

image

@pnedev
Copy link
Owner

pnedev commented Apr 4, 2023

@GitsHubber ,

Could you please show a screenshot of the same compare but without hiding the matched lines (that is without Show Only Diffs option)? Could you also restore the default coloring just for the screenshot?
I suspect that in between the diff sections you also have matched sections and the changed lines are looked for and marked only between the corresponding diff sections. So even if there is a better match for your changed line elsewhere in the file it depends on the diff context (the positioning of the line in the file corresponding to the other one).
This screenshot should make things more clear:

changed_lines

Here line 1 is better matched to line 3 and vice versa but because of line 2 those are separated and cannot be linked. The lines order between files has also changed.

@thereddestdog ,

You example looks good to me, what do you consider wrong there?

BR

@pnedev pnedev added the question label Apr 4, 2023
@thereddestdog
Copy link

thereddestdog commented Apr 5, 2023

@pnedev I think the extra inserted lines are not correct. I think it should look like this:

image

Thank you for a great plugin!

@pnedev
Copy link
Owner

pnedev commented Apr 6, 2023

@thereddestdog ,

Thank you for the feedback. You are right - old Compare plugin and the new ComparePlus plugin use different approaches when finding changed lines. There is no one universal solution and criteria on what is the right outcome (it depends on the text itself) - it is more intuitive for us people and we also have certain expectations.

For example look at these screenshots that illustrate the different approaches in another case:

compareplus110
compare202

For your example ComparePlus gives a "right" result but it kind-of stresses out that lines 10-11 are moved before lines 8-9 (so lines 8-9 and 10-11 have changed places) and are also changed in the beginning - R1, R2 are the diffs and the rest of the lines contents is the match context (sort-of-speak).

Compare plugin also gives a "right" result claiming that no lines have changed places (lines order is the same) only the end of the lines are different (cu and op210) and the rest is the match context (sort-of-speak).

Which result is better cannot be determined with certainty - it depends on the context and on which difference matters most.

Here is an example with your files data but slightly modified:

compareplus110
compare202

BR

@Yaron10
Copy link

Yaron10 commented Apr 8, 2023

Hello Pavel,

We've discussed this issue before.
ComparePlus is absolutely brilliant, and as you wrote: "Which result is better cannot be determined with certainty - it depends on the context and on which difference matters most.".

Allow me to seize the opportunity and bring another example from my recent real-life Compare.

Could you please compare these files?
Go to case IDC_COL_INITNUM_EDIT.

Result:

תמונה

I've tried several on-line Compare sites, and most indeed display that section as CP does.
But I've come across https://onlinetextcompare.com/ whose display seems clearer in this case.

תמונה

I do understand the issue is quite complex, and ComparePlus' approach is certainly much better in many cases.
So, this is just another example of this issue. :)

Thank you very much for your ongoing work. Much appreciated. 👍

BR

@pnedev
Copy link
Owner

pnedev commented Apr 8, 2023

Hello my friend Yaron,

I appreciate your feedback as always.
Detecting the most "appealing" to us changed lines pairs is an ambiguous task indeed. And I am still not totally happy with its implementation as well.

Great example! 👍
I have tried to make ComparePlus more customizable and to be able to accept "guidance" from us as to what we would like to stress in a comparison.
ComparePlus is by default punctual and pays attention to the empty lines as part of the compare context so in your example files to achieve the same result as https://onlinetextcompare.com/ you should set the Min line resemblance to mark as changed in the plugin Settings to 20% (or less) and also to use the Ignore Empty Lines option. That way you as a user decide to exclude the empty lines as something important and remove them from the compare context so the plugin tries to "look for differences" that are more important to you in that particular compare.

Happy holidays!
BR,
Pavel

@Yaron10
Copy link

Yaron10 commented Apr 9, 2023

Hello Pavel,

I hope you had a good day.

And I am still not totally happy with its implementation as well.

I am very happy! :)
And obviously - like in every field in life, there's always room for some improvements.

I've mentioned it several times before:
just analyzing the various possible scenarios and the ultimate desired result is a major task.
Implementing it is a mammoth project. I truly appreciate and admire your work.

It's been a few years now, but I have never fully and thoroughly analyzed the best Compare results.
Please remember that when reading the following comment. :)

I set Min line resemblance to mark as changed to 20% a long time ago.
Checking Ignore Empty Lines, the case IDC_COL_INITNUM_EDIT section is indeed displayed very clearly.
The "ultimate desired result" in this case. 👍

Could you please look at two more sections?

1. case IDC_COL_DEC_RADIO.

תמונה

תמונה

2. case CBN_SELCHANGE.

תמונה

תמונה

Thank you very much and best wishes.

@pnedev
Copy link
Owner

pnedev commented Apr 10, 2023

Hello again Yaron,

Thank you.
You are right about the new examples in your last post - I noticed them as well and they seem "sub-optimal" but I need to analyze them thoroughly and see if they could be improved without compromising other compare cases. I currently don't have the time to do so. They are now expected because of the
} break;
constructs that give recognizable match compare context.

I will write back if I come with some improvement on this.

BR

@Yaron10
Copy link

Yaron10 commented Apr 10, 2023

Hello Pavel,

Whenever you find time to look into it and possibly revise some elements in the current approach, it would be highly appreciated.

Analyzing only the few examples in this thread -
The ultimate goal is letting the user grasp the diffs as quickly, easily and clearly as possible.
Inserting new (alignment) lines, and thus creating "steps" - is a major slowing/confusing factor. It seems that NOT creating "steps" whenever possible would be much better.


תמונה

vs.

תמונה

I do understand the logic and advantage in the first display, but the second one is simpler more easily "grasped".


תמונה

The logic of this smart detection and display is obvious.
But I'm still wondering if it's worth the confusing "steps".
Perhaps even here marking lines 2 & 4 as "not-equal" and lines 3 as "added/removed" should be better.


תמונה

Again, the logic of this smart detection and display is obvious.
But is it worth extra-lines/steps?


Bottom line:
It's quite complex.

Thanks again for your time and hard work. 👍

@pnedev
Copy link
Owner

pnedev commented Apr 11, 2023

Hi Yaron,

Thanks again for the feedback and your thoughts on it, I appreciate it.
I'll think about a way to improve this and make the changed detection decision even "smarter" but I won't have much free time in the near future so whenever I get to it.
I'll currently focus mainly on fixing major issues with ComparePlus although there are some really good and useful feature requests and I also have some ideas for additional functionalities.

BR

@Yaron10
Copy link

Yaron10 commented Apr 11, 2023

Hello Pavel,

Whenever you have free time it'll be appreciated.

I'll currently focus mainly on fixing major issues with ComparePlus

What "major issues"? :)
Do you mean this thread?
I don't mean to rush you whatsoever - but the more I think about it, the more important it seems.

Allow me a few more points regarding the current issue.
I know that suggesting is much easier than implementing.
Thank you for your patience.


תמונה

I've mentioned the extra-lines/steps as a potential slowing/confusing factor.
That is: the user has to analyze 6 lines displayed in "steps" instead of "straight" 4.

Another factor is the lines numbers.

  1. We don't play with shuffled cards.
    The fact that two lines, although different, are positioned in line (e.g.) 8 in both files is important, and should (IMO) be taken into account by the "smart detection".

  2. Keeping lines 8 aligned should also make the user grasp the diffs more quickly.


The different lines background colors is also a major slowing factor.
That is: the "smart detection" should prefer to flow with the same block-background-color when possible.

תמונה


although there are some really good and useful feature requests and I also have some ideas for additional functionalities.

👍
I'll be looking forward to them.
Thank you so much. I do appreciate it.

@pnedev
Copy link
Owner

pnedev commented Apr 12, 2023

Hello Yaron,

What "major issues"? :)

When such issues appear, that's what I meant ;)
#350 was one such issue - good catch 👍

I've mentioned the extra-lines/steps as a potential slowing/confusing factor.
That is: the user has to analyze 6 lines displayed in "steps" instead of "straight" 4.

Well, I think that nevertheless that's the right approach. In the specific example given by @thereddestdog both of you are right that it will be better to align the lines and have one group of four changed lines. But more generally (and algorithmically speaking) it is better to find the most appropriate changed lines candidates and that will inevitably separate and displace the corresponding blocks. That way for example if you have a large compared code section that differs and is comprised of totally new code parts (that inevitably has the same semantic language constructs) and a slightly altered existing code part the user will quickly grasp that there is an existing slightly changed part (perhaps a commented section) and a totally new functionality block. Very often the corresponding added/removed blocks pair have different added/removed portion sizes as well.

The different lines background colors is also a major slowing factor.

Are you suggesting to change the colorization so there are no yellow portions but only green/red ones with highlighted and aligned changed portions/lines (just like the Github compare and https://onlinetextcompare.com/) ?
I also think that might be a better display IDK for sure.

Thank you for thinking about how to further optimize and make existing CP functionalities even better and more intuitive.
I appreciate that 👍

BR

@Yaron10
Copy link

Yaron10 commented Apr 13, 2023

Hello Pavel,

When such issues appear, that's what I meant.

Much appreciated. :)

Well, I think that nevertheless that's the right approach.
...
But more generally (and algorithmically speaking) it is better to find the most appropriate changed lines candidates and that will inevitably separate and displace the corresponding blocks.

I trust and highly respect your judgement.

Still, please allow me to ask: are you also referring to the case IDC_COL_INITNUM_EDIT, case IDC_COL_DEC_RADIO & case CBN_SELCHANGE sections?
IMO it's quite obvious that aligning those blocks should be much clearer for the user. You also labeled them as "sub-optimal".

If changing the approach and implementation is complex, I'd certainly understand and respect that too.

Are you suggesting to change the colorization so there are no yellow portions but only green/red ones with highlighted and aligned changed portions/lines?

Not categorically.
I wanted to emphasize that different background colors can also be visually distracting, and that's another reason to align blocks like case IDC_COL_INITNUM_EDIT.

I had another case in mind (possibly more complex; please ignore if it is).
Could you compare these files?

תמונה

Even if the label="Accounts" lines should be marked as "changed", it might be better (in this case) to mark them as "added/removed" so that the we have only two background colors in the block.


You wrote you didn't have free time these days, and here we are engaging in complex discussions. :)
Please let me know if you'd rather get back to it some time in the future.

Thank you very much once again. 👍

BR

@pnedev
Copy link
Owner

pnedev commented Apr 14, 2023

Hello Yaron,

You wrote you didn't have free time these days, ....... Please let me know if you'd rather get back to it some time in the future.

Its OK, I don't mind discussing how to further improve ComparePlus, it's a pleasure :)
I just won't have much time soon to implement new, complex features or changes.

Still, please allow me to ask: are you also referring to the case IDC_COL_INITNUM_EDIT, case IDC_COL_DEC_RADIO & case CBN_SELCHANGE sections?

IDC_COL_INITNUM_EDIT is OK as it is now because if we ignore the empty lines the compare context changes and the comparison is as expected.
As to IDC_COL_DEC_RADIO and CBN_SELCHANGE - those are compared OK now (by that I mean algorithmically it is expected and accurate) but from user perspective it is not optimal for sure and there is currently no way to "guide" the compare process to the desired compare context. Changing that will be complex but I'll think about making things better (when I find the time and energy).

I wanted to emphasize that different background colors can also be visually distracting...

Yes, I agree. I am not sure though that keeping the background green/red only will be better. Maybe it is worth trying and experimenting with it.

Even if the label="Accounts" lines should be marked as "changed", it might be better (in this case) to mark them as "added/removed" so that the we have only two background colors in the block.

Unfortunately I cannot do much about that. The only possibility would be to provide a way for the user to disable changed lines matching process. Or to use green/red background for changed lines as mentioned above.

Thank you for the feedback and sharing your thoughts about change detection implementation. 👍

BR

@Yaron10
Copy link

Yaron10 commented Apr 14, 2023

Hello Pavel,

Its OK, I don't mind discussing how to further improve ComparePlus, it's a pleasure :)

Great. It's a pleasure for me as well.

but from user perspective it is not optimal for sure and there is currently no way to "guide" the compare process to the desired compare context. Changing that will be complex but I'll think about making things better (when I find the time and energy).

👍
Time, energy and mood.
I do appreciate it.

I am not sure though that keeping the background green/red only will be better.

I may not have been clear about that, but I did not suggest it categorically.
On the contrary, the "tricolor" in general is useful and helpful.

Unfortunately I cannot do much about that...

That point was not as important.

Thank you. 👍
I wish you the best. - Energy and mood. :)

@pnedev
Copy link
Owner

pnedev commented Apr 18, 2023

Hello Yaron,

I have "optimized" compare diffs in the same dev binaries pointed in #353 (comment). For convenience I have pointed to the same dev binaries here: win32, win64
Now if you ignore the empty lines you would get the same result with your example files as when using https://onlinetextcompare.com/. That was a very helpful feedback, thank you for sharing it 👍
I haven't come across such compare example before.
I probably still need to do some optimizations (in speed mainly, the need for those might not be user-noticeable at the moment) but the dev binaries are perfectly usable as they are now.

BR

@Yaron10
Copy link

Yaron10 commented Apr 18, 2023

Hello Pavel,

Thank you for this major improvement. Both the columnEditor pair and the Chrome pair look "more correct", and the diffs are much easier to grasp. 👍

I appreciate your time and great work.

If you currently don't have time (or energy) for further discussing it, would you mind keeping this issue open?

Before the recent changes, the following section was "correctly" aligned even if Ignore Empty Lines was unchecked.

תמונה

With the recent commit:

תמונה

Don't you think the first results are better regardless of Ignore Empty Lines?
It's like pulling a short blanket. :)


I probably still need to do some optimizations (in speed mainly, the need for those might not be user-noticeable at the moment) but the dev binaries are perfectly usable as they are now.

Thank you for that too and best of luck.

@pnedev
Copy link
Owner

pnedev commented Apr 18, 2023

Hello Yaron,

Thank you.
I intend to keep this thread open for some time.

Don't you think the first results are better regardless of Ignore Empty Lines?

They are better for sure, you are right. But I need to analyze the compare algorithm thoroughly to check if and how something can be further improved.

BR

@Yaron10
Copy link

Yaron10 commented Apr 19, 2023

Hello Pavel,

I know that analyzing the compare algorithm should require a lot of and work.
Whenever you find the time and energy, It would be highly appreciated. 👍

BR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants