Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow writing to an existing Uint8Array, and add lastChunkHandling option #33

Merged
merged 2 commits into from
Jan 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,36 @@ console.log(Uint8Array.fromHex(string));

This would add `Uint8Array.prototype.toBase64`/`Uint8Array.prototype.toHex` and `Uint8Array.fromBase64`/`Uint8Array.fromHex` methods. The latter pair would throw if given a string which is not properly encoded.

## Options
## Base64 options

An options bag argument for the base64 methods allows specifying the alphabet as either `base64` or `base64url`.
Additional options are supplied in an options bag argument:

When decoding, the options bag also allows specifying `strict: false` (the default) or `strict: true`. When using `strict: false`, whitespace is legal and padding is optional. When using `strict: true`, whitespace is forbidden and standard padding (including any overflow bits in the last character being 0) is enforced - i.e., only [canonical](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5) encodings are allowed.
- `alphabet`: Allows specifying the alphabet as either `base64` or `base64url`.

- `lastChunkHandling`: Recall that base64 decoding operates on chunks of 4 characters at a time, but the input maybe have some characters which don't fit evenly into such a chunk of 4 characters. This option determines how the final chunk of characters should be handled. The three options are `"loose"` (the default), which treats the chunk as if it had any necessary `=` padding (but throws if this is not possible, i.e. there is exactly one extra character); `"strict"`, which enforces that the chunk has exactly 4 characters (counting `=` padding) and that [overflow bits](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5) are 0; and `"stop-before-partial"`, which stops decoding before the final chunk unless the final chunk has exactly 4 characters.

The hex methods do not take any options.

## Writing to an existing Uint8Array

The `Uint8Array.fromBase64Into` method allows writing to an existing Uint8Array. Like the [TextEncoder `encodeInto` method](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto), it returns a `{ read, written }` pair.

```js
let target = new Uint8Array(8);
let { read, written } = Uint8Array.fromBase64Into('Zm9vYmFy', target);
assert.deepStrictEqual([...target], [102, 111, 111, 98, 97, 114, 0, 0]);
assert.deepStrictEqual({ read, written }, { read: 8, written: 6 });
```

This method takes an optional final options bag with the same options as above.

As with `encodeInto`, there is not explicit support for writing to specified offset of the target, but you can accomplish that by creating a subarray.

`Uint8Array.fromHexInto` is the same except for hex.

## Streaming

There is no support for streaming. However, it is [relatively straightforward to do effeciently in userland](./stream.mjs) on top of this API, with support for all the same options as the underlying functions.
There is no explicit support for streaming. However, it is [relatively straightforward to do effeciently in userland](./stream.mjs) on top of this API, with support for all the same options as the underlying functions.

## FAQ

Expand Down Expand Up @@ -68,17 +89,17 @@ For hex, both lowercase and uppercase characters (including mixed within the sam

### How is `=` padding handled?

Padding is always generated. The base64 decoder does not require it to be present unless `strict: true` is specified; however, if it is present, it must be well-formed (i.e., once stripped of whitespace the length of the string must be a multiple of 4, and there can be 1 or 2 padding `=` characters).
Padding is always generated. The base64 decoder allows specifying how to handle inputs without it with the `lastChunkHandling` option.

### How are the extra padding bits handled?

If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bits, which means you have an extra 4 or 2 bits which don't encode anything.

Per [the RFC](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5), decoders MAY reject input strings where the padding bits are non-zero. Here, non-zero padding bits are silently ignored when `strict: false` (the default), and are an error when `strict: true`.
Per [the RFC](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5), decoders MAY reject input strings where the padding bits are non-zero. Here, non-zero padding bits are silently ignored unless `lastChunkHandling: "strict"` is specified.

### How is whitespace handled?

The encoders do not output whitespace. The hex decoder does not allow it as input. The base64 decoder allows [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace) anywhere in the string as long as `strict: true` is not specified.
The encoders do not output whitespace. The hex decoder does not allow it as input. The base64 decoder allows [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace) anywhere in the string.

### How are other characters handled?

Expand Down Expand Up @@ -107,7 +128,3 @@ That's also been the consensus when it's come up [previously](https://discourse.
### What if I just want to encode a portion of an ArrayBuffer?

Uint8Arrays can be partial views of an underlying buffer, so you can create such a view and invoke `.toBase64` on it.

### What if I want to decode a Base64 or hex string into an existing Typed Array or ArrayBuffer?

While that is a reasonable things to want, I think it need not be included in the initial version of this API. We can add it later if demand proves high. Until then, copying slices of memory (e.g. using `target.set(chunk, offset)`) is quite fast.
8 changes: 4 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
"name": "proposal-arraybuffer-base64",
"scripts": {
"build-playground": "mkdir -p dist && cp playground/* dist && node scripts/static-highlight.js playground/index-raw.html > dist/index.html && rm dist/index-raw.html",
"build-spec": "mkdir -p dist/spec && ecmarkup --lint-spec --strict --load-biblio @tc39/ecma262-biblio --verbose spec.html --assets-dir dist/spec dist/spec/index.html",
"build-spec": "mkdir -p dist/spec && ecmarkup --lint-spec --strict --load-biblio @tc39/ecma262-biblio --verbose --mark-effects spec.html --assets-dir dist/spec dist/spec/index.html",
"build": "npm run build-playground && npm run build-spec",
"format": "emu-format --write spec.html",
"check-format": "emu-format --check spec.html"
},
"dependencies": {
"@tc39/ecma262-biblio": "2.1.2653",
"@tc39/ecma262-biblio": "^2.1.2663",
"ecmarkup": "^18.0.0",
"jsdom": "^21.1.1",
"prismjs": "^1.29.0"
Expand Down
39 changes: 28 additions & 11 deletions playground/index-raw.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
<script>
// logic for making codeblocks copyable
// svg from https://feathericons.com/
'use strict';
let copySVG = '<svg xmlns="http://www.w3.org/2000/svg" width="20" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather feather-copy"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg>';

if (navigator.clipboard) {
Expand Down Expand Up @@ -84,8 +85,8 @@ <h3>Basic usage</h3>
</code></pre>

<h3>Options</h3>
<p>The base64 methods take an optional options bag which allows specifying the alphabet as either "base64" (the default) or "base64url" (<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-5">the URL-safe variant</a>).</p>
<p>When encoding, the options bag also allows specifying <code>strict: false</code> (the default) or <code>strict: true</code>. When using <code>strict: false</code>, whitespace is legal and padding is optional. When using <code>strict: true</code>, whitespace is forbidden and standard padding (including any overflow bits in the last character being 0) is enforced - i.e., only <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-3.5">canonical</a> encodings are allowed.</p>
<p>The base64 methods take an optional options bag which allows specifying the alphabet as either <code>"base64"</code> (the default) or <code>"base64url"</code> (<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-5">the URL-safe variant</a>).</p>
<p>The base64 decoder also allows specifying the behavior for the final chunk with <code>lastChunkHandling</code>. Recall that base64 decoding operates on chunks of 4 characters at a time, but the input maybe have some characters which don't fit evenly into such a chunk of 4 characters. This option determines how the final chunk of characters should be handled. The three options are <code>"loose"</code> (the default), which treats the chunk as if it had any necessary <code>=</code> padding (but throws if this is not possible, i.e. there is exactly one extra character); <code>"strict"</code>, which enforces that the chunk has exactly 4 characters (counting <code>=</code> padding) and that <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-3.5">overflow bits</a> are 0; and <code>"stop-before-partial"</code>, which stops decoding before the final chunk unless the final chunk has exactly 4 characters.
<p>The hex methods do not have any options.</p>

<pre class="language-js"><code class="language-js">
Expand All @@ -99,22 +100,38 @@ <h3>Options</h3>
// works, despite whitespace, missing padding, and non-zero overflow bits

try {
Uint8Array.fromBase64('SGVsbG8g\nV29ybG Q=', { strict: true });
Uint8Array.fromBase64('SGVsbG8gV29ybGR=', { lastChunkHandling: 'strict' });
} catch {
console.log('with strict: true, whitespace is rejected');
console.log('with lastChunkHandling: "strict", overflow bits are rejected');
}
try {
Uint8Array.fromBase64('SGVsbG8gV29ybGQ', { strict: true });
Uint8Array.fromBase64('SGVsbG8gV29ybGQ', { lastChunkHandling: 'strict' });
} catch {
console.log('with strict: true, padding is required');
}
try {
Uint8Array.fromBase64('SGVsbG8gV29ybGR=', { strict: true });
} catch {
console.log('with strict: true, non-zero overflow bits are rejected');
console.log('with lastChunkHandling: "strict", overflow bits are rejected');
}
</code></pre>

<h3>Writing to an existing Uint8Array</h3>
<p>The <code>Uint8Array.fromBase64Into</code> method allows writing to an existing Uint8Array. Like the <a href="https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto">TextEncoder <code>encodeInto</code> method</a>, it returns a <code>{ read, written }</code> pair.</p>

<p>This method takes an optional final options bag with the same options as above.</p>

<pre class="language-js"><code class="language-js">
let target = new Uint8Array(7);
let { read, written } = Uint8Array.fromBase64Into('Zm9vYmFy', target);
console.log({ target, read, written });
// { target: Uint8Array([102, 111, 111, 98, 97, 114, 0]), read: 8, written: 6 }
</code></pre>

<p><code>Uint8Array.fromHexInto</code> is the same except for hex.</p>

<pre class="language-js"><code class="language-js">
let target = new Uint8Array(6);
let { read, written } = Uint8Array.fromHexInto('deadbeef', target);
console.log({ target, read, written });
// { target: Uint8Array([222, 173, 190, 239, 0, 0]), read: 8, written: 4 }
</code></pre>

<h3>Streaming</h3>
<p>There is no support for streaming. However, <a href="https://github.com/tc39/proposal-arraybuffer-base64/blob/main/stream.mjs">it can be implemented in userland</a>.</p>

Expand Down
Loading