Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix message corruption when displaying a base64 encoded message #9290

Closed
wants to merge 3 commits into from

Conversation

SolracLeinad
Copy link

It's 2023 and I'm still experiencing this bug: #4879

The problem is roundcube tries to decode the whole chunk as one big base64 string and while some clients might encode the whole message in one big chunk, some others will do it in several chunks. Decoding multiple base64 strings as one will corrupt the message.

To ammend this, we will decode the whole message line by line because every line is a valid base64 string.

@alecpl
Copy link
Member

alecpl commented Dec 28, 2023

I don't think this is correct. Could you provide a sample message that does not work? I'm not sure that "every line is a valid base64 string" is true.

@SolracLeinad
Copy link
Author

SolracLeinad commented Dec 29, 2023

Here's the beginning of a Logwatch message, logwatch encodes the header in one "base64 message" and then the next part is another "base64 message", then there''s a third one; all these are part of the same mail.

Roundcube will try to decode the message as a whole producing a lot of garbage, meanwhile if you decode the same message line by line it will produce a proper output. Of course one can get any kind of garbage in a email, but base64 encoded emails are usually fixed width, allowing line by line decoding.

Example:

PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv L0VOIiAiaHR0cDovL3d3dy53My5vcmcvVFIvUkVDLWh0bWw0MC9sb29zZS5kdGQiPgo8aHRtbD4K PGhlYWQ+Cjx0aXRsZT5Mb2d3YXRjaCAgNy40LjMgICggMTIvMDcvMTYgKTwvdGl0bGU+CjxtZXRh IG5hbWU9ImdlbmVyYXRvciIgY29udGVudD0iTG9nd2F0Y2ggIDcuNC4zICggMTIvMDcvMTYgKSI+ CjxzdHlsZSB0eXBlPSJ0ZXh0L2NzcyI+CiAgaDEge2NvbG9yOiBncmF5OyBib3JkZXItYm90dG9t OiAzcHggZG91YmxlIHNpbHZlcjsgZm9udC1mYW1pbHk6IHNhbnMtc2VyaWY7IH0KICBoMiB7Y29s b3I6IHdoaXRlOyBib3JkZXItYm90dG9tOiAxcHggc29saWQgc2lsdmVyOyBmb250LWZhbWlseTog c2Fucy1zZXJpZjsgfQogIGgzIHtjb2xvcjogd2hpdGU7IGJvcmRlci1ib3R0b206IDFweCBzb2xp ZCBzaWx2ZXI7IGZvbnQtZmFtaWx5OiBzYW5zLXNlcmlmOyB9CiAgdGgge2JhY2tncm91bmQ6ICM2 RDg4QUQ7IHRleHQtYWxpZ246IGxlZnQ7IGZvbnQtZmFtaWx5OiBzYW5zLXNlcmlmOyB9CiAgdGQg e2JhY2tncm91bmQ6ICNFRkVGRUY7IHRleHQtYWxpZ246IGxlZnQ7IGZvbnQtZmFtaWx5OiBjb3Vy aWVyLHNlcmlmOyBmb250LXNpemU6IDEwcHg7IH0KICBsaSB7IGZvbnQtZmFtaWx5OiBzYW5zLXNl cmlmOyB9CiAgLnJlZiB7cGFkZGluZy1sZWZ0OiAxJTsgfQogIC5zZXJ2aWNlIHtwYWRkaW5nLWxl ZnQ6IDElOyB9CiAgLnJldHVybl9saW5rIHtib3JkZXItdG9wOiAxcHg7IGJvcmRlci1ib3R0b206 IDFweDsKICBwYWRkaW5nOiAxJTsgbWFyZ2luLXRvcDogMSU7IG1hcmdpbi1ib3R0b206IDElOyBm b250LWZhbWlseTogc2Fucy1zZXJpZjsgfQogIC5jb3B5cmlnaHQge2NvbG9yOiBibGFjazsgYm9y ZGVyLXRvcDogMXB4IHNvbGlkIGdyZXk7CiAgYm9yZGVyLWJvdHRvbTogMXB4IHNvbGlkIGdyZXk7 CiAgcGFkZGluZzogMSU7IG1hcmdpbi10b3A6IDElOyBtYXJnaW4tYm90dG9tOiAxJTt9Cjwvc3R5 bGU+CjwvaGVhZD4KPGJvZHkgc3R5bGU9IndpZHRoOjkwJTsgbWFyZ2luLWxlZnQ6IDUlOyBtYXJn aW4tcmlnaHQ6IDUlIiBiZ2NvbG9yPSIjRkZGRkZGIiA+Cjxocj4KPCEtLSBFbmQgaGVhZGVyLmh0 bWwgLS0+Cg== PGEgbmFtZT10b3A+PHVsPgogICA8bGk+PGEgaHJlZj0iIzEiPkxPR1dBVENIIFN1bW1hcnk8L2E+ CiAgIDxsaT48YSBocmVmPSIjMiI+Q3JvbjwvYT4KICAgPGxpPjxhIGhyZWY9IiMzIj5wYW1fdW5p eDwvYT4KICAgPGxpPjxhIGhyZWY9IiM0Ij5Db25uZWN0aW9ucyAoc2VjdXJlLWxvZyk8L2E+CiAg IDxsaT48YSBocmVmPSIjNSI+dnNmdHBkLW1lc3NhZ2VzPC9hPgogICA8bGk+PGEgaHJlZj0iIzYi PkRpc2sgU3BhY2U8L2E+CjwvdWw+PC9hPgo= PGRpdiBjbGFzcz1zZXJ2aWNlPgogICAgICAgICA8dGFibGUgYm9yZGVyPTEgd2lkdGg9MTAwJT4K ICAgICAgICAgICA8dHI+PHRoPgogICAgICAgICAgIDxoMj48YSBuYW1lPSIxIj5MT0dXQVRDSCBT dW1tYXJ5PC9hPjwvaDI+CiAgICAgICAgICAgPC90cj48L3RoPgogIDx0cj4KICAgIDx0ZCBiZ2Nv bG9yPSNkZGRkZGQ+ICZuYnNwOyAmbmJzcDsgJm5ic3A7IExvZ3dhdGNoIFZlcnNpb246IDcuNC4z ICgxMi8wNy8xNikKPC90ZD4KICA8L3RyPgogIDx0cj4KICAgIDx0ZCBiZ2NvbG9yPSNkZGRkZGQ+ ICZuYnNwOyAmbmJzcDsgJm5ic3A7IFByb2Nlc3NpbmcgSW5pdGlhdGVkOiBUaHUgRGVjIDI4IDA2 OjI1OjA1IDIwMjMKPC90ZD4KICA8L3RyPgogIDx0cj4KICAgIDx0ZCBiZ2NvbG9yPSNkZGRkZGQ+ ICZuYnNwOyAmbmJzcDsgJm5ic3A7IERhdGUgUmFuZ2UgUHJvY2Vzc2VkOiB0b2RheQo8L3RkPgog IDwvdHI+CiAgPHRyPgogICAgPHRkIGJnY29sb3I9I2RkZGRkZD4gJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAoIDIwMjMtRGVjLTI4ICkKPC90ZD4KICA8L3RyPgogIDx0 cj4KICAgIDx0ZCBiZ2NvbG9yPSNkZGRkZGQ+ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgUGVyaW9kIGlzIGRheS4KPC90ZD4KICA8L3RyPgogIDx0cj4KICAgIDx0ZCBi Z2NvbG9yPSNkZGRkZGQ+ICZuYnNwOyAmbmJzcDsgJm5ic3A7IERldGFpbCBMZXZlbCBvZiBPdXRw dXQ6IDUKPC90ZD4KICA8L3RyPgogIDx0cj4KICAgIDx0ZCBiZ2NvbG9yPSNkZGRkZGQ+ICZuYnNw OyAmbmJzcDsgJm5ic3A7IFR5cGUgb2YgT3V0cHV0L0Zvcm1hdDogbWFpbCAvIGh0bWwKPC90ZD4K ICA8L3RyPgogIDx0cj4KICAgIDx0ZCBiZ2NvbG9yPSNkZGRkZGQ+ICZuYnNwOyAmbmJzcDsgJm5i c3A7IExvZ2ZpbGVzIGZvciBIb3N0OiBjaW50ZW90bAo8L3RkPgogIDwvdHI+CiAgPC90YWJsZT48 L2Rpdj4KICA8ZGl2IGNsYXNzPXJldHVybl9saW5rPjxwPjxhIGhyZWY9IiN0b3AiPkJhY2sgdG8g VG9wPC9hPjwvcD48L2Rpdj4K

@alecpl
Copy link
Member

alecpl commented Dec 29, 2023

RFC2045: All line breaks or other characters not found in Table 1 must be ignored by decoding software.

Also https://bugzilla.redhat.com/show_bug.cgi?id=1123093

Processing line by line is not correct because lines might have a length that will not allow to decode them. And I think it is more common case than yours.

We could try to detect '=' and act accordingly, I suppose.

@SolracLeinad
Copy link
Author

We could try to detect '=' and act accordingly, I suppose.

You are right, this seems to be the best way to deal with it.
This quick and dirty fix seems to fix it:

    protected static function decodeContent($chunk, $mode, $is_last = false, &$prev = '')
    {
        // BASE64
        if ($mode == 1) {
            $chunk = $prev . preg_replace('|[^a-zA-Z0-9+=/]|', '', $chunk);

            // create chunks with proper length for base64 decoding
            $length = strlen($chunk);

            if ($length % 4) {
                $length = floor($length / 4) * 4;
                $prev = substr($chunk, $length);
                $chunk = substr($chunk, 0, $length);
            }
            else {
                $prev = '';
            }

            // return base64_decode($chunk);

            // split the message by '=' and decode every part individually.
            $decoded_message = '';
            $exploded_chunk = explode('=', $chunk);
            foreach($exploded_chunk as $base64_block){
                $decoded_message .= base64_decode($base64_block);
            }
            return $decoded_message;
        }

$chunk = substr($chunk, 0, $length);
$decoded_chunk = '';
foreach(explode('=', preg_replace('~[^a-zA-Z0-9+=/]~', '', $chunk)) as $line) {
$decoded_chunk .= base64_decode($line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the &$prev should probably be still supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be supported. Also the code with $length % 4 is also important. You should not decode incomplete sequences.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are basically back to the code I posted, right?

        if ($mode == 1) {
            $chunk = $prev . preg_replace('|[^a-zA-Z0-9+=/]|', '', $chunk);
            $length = strlen($chunk);

            if ($length % 4) {
                $length = floor($length / 4) * 4;
                $prev = substr($chunk, $length);
                $chunk = substr($chunk, 0, $length);
            } else {
                $prev = '';
            }
          
            $decoded_chunk = '';
            foreach(explode('=', $chunk) as $line) {
                $decoded_chunk  .= base64_decode($line);
            }
            return $decoded_chunk;
        }

@alecpl
Copy link
Member

alecpl commented Jan 4, 2024

You have a sample message here. Sometimes the delimiter is =, but sometimes ==. So, that's one thing. Then I think you have to first do explode and then handle the modulo 4 code for each element separately. Really we should write some tests for this method.

@pabzm
Copy link
Member

pabzm commented May 6, 2024

@SolracLeinad Please resolve the conflicts and add tests to this, that would help!

@alecpl
Copy link
Member

alecpl commented Jun 16, 2024

Fixed in 613629f.

@alecpl alecpl closed this Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants