Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better False-Positive Detection in rz_scan_strings #2691

Merged
merged 3 commits into from
Jun 14, 2022

Conversation

borzacchiello
Copy link
Contributor

@borzacchiello borzacchiello commented Jun 13, 2022

Hello everyone!
I tried to improve rz_scan_strings, with the main goal of making the false-positive check more robust, and fix #1836.
I also added a global option to activate/deactivate a check in the false-positive detection function. I am not sure that I added it in the correct way, so let me know.

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

This PR adds the following features:

  • Extend the false-positive check on ASCII frequencies to all UTF strings (previously it was performed only on UTF16LE strings).
  • Add a global option to activate/deactivate the above-mentioned check.
  • Improve the false-positive heuristic by adding a special case for extended-ASCII strings. This is useful for European languages.

I had to modify some regression test because some of them checked the number of detected strings that is changed due to the PR.
The removed strings should be only false positives: this zip file contains a diff for each binary of the regression tests for which the number of strings found changed.

Test plan

...

Closing issues

closes #1836

@XVilka XVilka added this to the 0.5.0 milestone Jun 13, 2022
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I will check more with non-European languages tonight and will merge if it works well. Awesome job! 💯

librz/core/cconfig.c Outdated Show resolved Hide resolved
This commit adds the following features:
- Extend the false-positive check on ASCII frequencies to all UTF strings
- Add a global option to activate/deactivate di check
- Improve the false-positive heuristic adding a special case for extended-ASCII strings
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not cause regressions on non-European languages. 👍

@XVilka XVilka merged commit 2d69ecb into rizinorg:dev Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix UTF-64LE string detection when contains \xa1 character
3 participants