Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing test for encode_utf8 #683

Merged
merged 3 commits into from
Dec 11, 2023
Merged

Conversation

CXWorks
Copy link
Contributor

@CXWorks CXWorks commented Dec 9, 2023

Hi,

Thanks for your time & patience to review this PR.

We are researchers focusing on Rust unit tests by LLM. By examine the existing code, we found a unit test can be added to improve the repo's overall unit test coverage(this project is already been well tested).
The code region to cover is:

bincode/src/enc/impls.rs

Lines 328 to 335 in 73258a7

} else {
let mut buf = [0u8; 4];
buf[0] = (code >> 18 & 0x07) as u8 | TAG_FOUR_B;
buf[1] = (code >> 12 & 0x3F) as u8 | TAG_CONT;
buf[2] = (code >> 6 & 0x3F) as u8 | TAG_CONT;
buf[3] = (code & 0x3F) as u8 | TAG_CONT;
writer.write(&buf)
}

Thanks again for reviewing.

@VictorKoenders
Copy link
Contributor

IoWriter is only enabled when feature std is enabled, please enable this test accordingly or use one of the &[u8] functions

@CXWorks
Copy link
Contributor Author

CXWorks commented Dec 10, 2023

Thanks for your time, I add the std only flag on the unit test to pass the CI pipeline.

@VictorKoenders
Copy link
Contributor

I'm not sure if this PR adds something to our code base. To a reader there is a test called test_encode_utf8 that tests a singular utf8 character.

  • What does this \u{1F600} character mean?
  • What does this test do? Why does it check for 4 bytes written?
  • What was the reason this test was added?

If your goal is to add "random input to make sure bincode doesn't crash" then I think it's better to improve our fuzzing framework, that's what it exists for.

To me it would make a lot more sense to, for example:

  • Test a list of well-known (or not so well-known) characters that trip up code bases
    • make sure they all encode to a well known length
  • Test a list of garbage utf8 data and see bincode correctly reject it
  • etc

Copy link

codecov bot commented Dec 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73258a7) 57.30% compared to head (b166e71) 57.32%.

Additional details and impacted files
@@            Coverage Diff             @@
##            trunk     #683      +/-   ##
==========================================
+ Coverage   57.30%   57.32%   +0.01%     
==========================================
  Files          51       51              
  Lines        4335     4344       +9     
==========================================
+ Hits         2484     2490       +6     
- Misses       1851     1854       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CXWorks
Copy link
Contributor Author

CXWorks commented Dec 11, 2023

Hi, thanks for your feedback. Sorry I did't find the correct place to in the existing unit test for the target function, I think it's better to merge this test in the following location(I will fix it shortly).

for char in "aÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö文".chars()
{
the_same(char);
}

This PR is trying to add test coverage for the region shown above in the encode_utf8 function. We found the target function is tested by the unit tests but not fully covered. Below are my answers to your questions

I'm not sure if this PR adds something to our code base. To a reader there is a test called test_encode_utf8 that tests a singular utf8 character.

  • What does this \u{1F600} character mean?

Any utf8 with 4 length can fit to cover the region. This character is generated by machine so it doesn't have special meanings. I can create more examples for this test manually,

  • What does this test do? Why does it check for 4 bytes written?

I will merge this test data into the existing testing frmework.

  • What was the reason this test was added?

As mentioned above, the target function is tested but not fully covered, by adding this test, all the regions in the function is tested. BTW, this test is fully auto generated(from detection to the executable test), we just want to help improve the test coverage. If you feel uncomfortable about it 😔, feel free to close the PR, thanks for your time & patience.

If your goal is to add "random input to make sure bincode doesn't crash" then I think it's better to improve our fuzzing framework, that's what it exists for.

I inspect the current fuzzing framework, I don't see the char as target to be tested and am happy to add this type into the fuzzing framework, do you want me to do that?

To me it would make a lot more sense to, for example:

  • Test a list of well-known (or not so well-known) characters that trip up code bases
    • make sure they all encode to a well known length
  • Test a list of garbage utf8 data and see bincode correctly reject it
  • etc

These are good suggections, I can create more examples for this test manually.

@VictorKoenders VictorKoenders merged commit dd82a9c into bincode-org:trunk Dec 11, 2023
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants