Add missing test for encode_utf8 #683

CXWorks · 2023-12-09T04:06:12Z

Hi,

Thanks for your time & patience to review this PR.

We are researchers focusing on Rust unit tests by LLM. By examine the existing code, we found a unit test can be added to improve the repo's overall unit test coverage(this project is already been well tested).
The code region to cover is:

bincode/src/enc/impls.rs

Lines 328 to 335 in 73258a7

    
           } else { 
        
               let mut buf = [0u8; 4]; 
        
               buf[0] = (code >> 18 & 0x07) as u8 | TAG_FOUR_B; 
        
               buf[1] = (code >> 12 & 0x3F) as u8 | TAG_CONT; 
        
               buf[2] = (code >> 6 & 0x3F) as u8 | TAG_CONT; 
        
               buf[3] = (code & 0x3F) as u8 | TAG_CONT; 
        
               writer.write(&buf) 
        
           }

Thanks again for reviewing.

VictorKoenders · 2023-12-10T19:37:53Z

IoWriter is only enabled when feature std is enabled, please enable this test accordingly or use one of the &[u8] functions

CXWorks · 2023-12-10T20:26:06Z

Thanks for your time, I add the std only flag on the unit test to pass the CI pipeline.

VictorKoenders · 2023-12-11T06:41:41Z

I'm not sure if this PR adds something to our code base. To a reader there is a test called test_encode_utf8 that tests a singular utf8 character.

What does this \u{1F600} character mean?
What does this test do? Why does it check for 4 bytes written?
What was the reason this test was added?

If your goal is to add "random input to make sure bincode doesn't crash" then I think it's better to improve our fuzzing framework, that's what it exists for.

To me it would make a lot more sense to, for example:

Test a list of well-known (or not so well-known) characters that trip up code bases
- make sure they all encode to a well known length
Test a list of garbage utf8 data and see bincode correctly reject it
etc

codecov · 2023-12-11T06:45:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73258a7) 57.30% compared to head (b166e71) 57.32%.

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk     #683      +/-   ##
==========================================
+ Coverage   57.30%   57.32%   +0.01%     
==========================================
  Files          51       51              
  Lines        4335     4344       +9     
==========================================
+ Hits         2484     2490       +6     
- Misses       1851     1854       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CXWorks · 2023-12-11T07:14:15Z

Hi, thanks for your feedback. Sorry I did't find the correct place to in the existing unit test for the target function, I think it's better to merge this test in the following location(I will fix it shortly).

bincode/tests/basic_types.rs

Lines 36 to 39 in 73258a7

    
           for char in "aÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö文".chars() 
        
           { 
        
               the_same(char); 
        
           }

This PR is trying to add test coverage for the region shown above in the encode_utf8 function. We found the target function is tested by the unit tests but not fully covered. Below are my answers to your questions

I'm not sure if this PR adds something to our code base. To a reader there is a test called test_encode_utf8 that tests a singular utf8 character.

What does this \u{1F600} character mean?

Any utf8 with 4 length can fit to cover the region. This character is generated by machine so it doesn't have special meanings. I can create more examples for this test manually,

What does this test do? Why does it check for 4 bytes written?

I will merge this test data into the existing testing frmework.

What was the reason this test was added?

As mentioned above, the target function is tested but not fully covered, by adding this test, all the regions in the function is tested. BTW, this test is fully auto generated(from detection to the executable test), we just want to help improve the test coverage. If you feel uncomfortable about it 😔, feel free to close the PR, thanks for your time & patience.

If your goal is to add "random input to make sure bincode doesn't crash" then I think it's better to improve our fuzzing framework, that's what it exists for.

I inspect the current fuzzing framework, I don't see the char as target to be tested and am happy to add this type into the fuzzing framework, do you want me to do that?

To me it would make a lot more sense to, for example:

Test a list of well-known (or not so well-known) characters that trip up code bases

make sure they all encode to a well known length

Test a list of garbage utf8 data and see bincode correctly reject it

etc

These are good suggections, I can create more examples for this test manually.

Add missing test for encode_utf8

8cc6e1b

Fix CI error by only run under std

dc17ae6

Merge test into testing framework for utf8 testing

b166e71

VictorKoenders merged commit dd82a9c into bincode-org:trunk Dec 11, 2023
74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing test for encode_utf8 #683

Add missing test for encode_utf8 #683

CXWorks commented Dec 9, 2023

VictorKoenders commented Dec 10, 2023

CXWorks commented Dec 10, 2023

VictorKoenders commented Dec 11, 2023

codecov bot commented Dec 11, 2023 •

edited

Loading

CXWorks commented Dec 11, 2023 •

edited

Loading

	} else {
	let mut buf = [0u8; 4];
	buf[0] = (code >> 18 & 0x07) as u8 \| TAG_FOUR_B;
	buf[1] = (code >> 12 & 0x3F) as u8 \| TAG_CONT;
	buf[2] = (code >> 6 & 0x3F) as u8 \| TAG_CONT;
	buf[3] = (code & 0x3F) as u8 \| TAG_CONT;
	writer.write(&buf)
	}

Add missing test for encode_utf8 #683

Add missing test for encode_utf8 #683

Conversation

CXWorks commented Dec 9, 2023

VictorKoenders commented Dec 10, 2023

CXWorks commented Dec 10, 2023

VictorKoenders commented Dec 11, 2023

codecov bot commented Dec 11, 2023 • edited Loading

Codecov Report

CXWorks commented Dec 11, 2023 • edited Loading

codecov bot commented Dec 11, 2023 •

edited

Loading

CXWorks commented Dec 11, 2023 •

edited

Loading