Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify lcs_substr and remove_common_tokens #46

Merged
merged 1 commit into from
Sep 11, 2019
Merged

Simplify lcs_substr and remove_common_tokens #46

merged 1 commit into from
Sep 11, 2019

Conversation

AnthonyMikh
Copy link
Contributor

Description of changes:
The code of lcs_substr and remove_common_tokens was changed in order to make them easier to understand (at least, I hope so). No attempt was made to address performance issues.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jpeddicord
Copy link
Owner

Yikes! I missed this, sorry. Feel free to poke me if things sit too long. I think this is a great change, though do note that the lcs bits are currently disabled due to a bug uncovered in their implementation. I can't tell if this fixes it, but I'm happy to merge this in any case as it greatly simplifies things.

Nice work!

@jpeddicord jpeddicord merged commit 0c6e60e into jpeddicord:master Sep 11, 2019
@AnthonyMikh
Copy link
Contributor Author

Thanks, it's nice to hear.

I want to note that I haven't make any attempt to fix the transformation, I've just kept the behaviour as is (tests run successfuly). Nevertheless, some parts seem suspicious for me:

  1. The original code of remove_common_tokens (as well as mine) goes through consecutive pairs of lines. They do not interleave, so calculated lcs_substr doesn't take some line relations into account.
  2. When the code makes the result, it looks at the length of largest substring (or should it be rather called "largest common prefix"?) and if it is large enough, function returns only the lines that starts with largest substring, otherwise the function unconditionally returns all the lines of text.

I am not sure if these moments have something to do with #42, but they seem like the opportunity to make an error. Feel free to prove me wrong.

@AnthonyMikh AnthonyMikh deleted the AnthonyMikh/simplify_preproc branch September 17, 2019 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants