Skip to content

Fix minor bugs in LZ4H5 #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix minor bugs in LZ4H5 #127

wants to merge 1 commit into from

Conversation

nhz2
Copy link

@nhz2 nhz2 commented Apr 9, 2025

This PR fixes a few minor bugs in lz4h5. I was unable to compile this package locally for testing, but I think this will fix the last compatibility issues I'm seeing in JuliaIO/ChunkCodecs.jl#27

These are the bugs this PR tries to fix:

  1. When encoding empty bytes, the output is currently 16 bytes, but it should be 12 bytes, 11 0x00, and one 0x01 at the end.

This PR removes the special case

        if srcsize == 0:
            write_i4be(<uint8_t*> &dst[dstpos], <uint32_t> 0)
            dstpos += 4

that adds these 4 null bytes and adds a check in the while loop that there is enough space for the block header.

            if dstsize - dstpos < 5:
                raise Lz4h5Error('output too small')
  1. Encoding more than 2 GB causes an integer overflow.
>>> imagecodecs.lz4h5_encode(bytearray(2147483647))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "imagecodecs/_lz4.pyx", line 305, in imagecodecs._lz4.lz4h5_encode
imagecodecs.Lz4h5Error: LZ4_compress_fast returned 0

The added (min(dstsize - dstpos, 2147483647)), should fix this problem.

  1. Encoding more than 1 block, but the first block fills the output fails to correctly encode but doesn't throw an error.
>>> src = bytearray(17)
>>> dst = bytearray(26)
>>> out = imagecodecs.lz4h5_encode(src, out=dst, blocksize=16)
>>> imagecodecs.lz4h5_decode(out) == src
False

This should be fixed by moving the and dstpos < dstsize out of the while loop condition and instead throwing an error if the dst space left is too small.

  1. read_i4be on line 358 can read out of bounds if there are only 3 to 1 bytes left after the first block.

This PR adds a check to fix this:

            if srcsize - srcpos < 4:
                raise Lz4h5Error('LZ4H5 data too short')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants