Skip to content

Commit

Permalink
write to existing buffers, and lastChunkHandling
Browse files Browse the repository at this point in the history
  • Loading branch information
bakkot committed Dec 13, 2023
1 parent 2be36d3 commit 8dc1667
Show file tree
Hide file tree
Showing 9 changed files with 705 additions and 244 deletions.
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,34 @@ console.log(Uint8Array.fromHex(string));

This would add `Uint8Array.prototype.toBase64`/`Uint8Array.prototype.toHex` and `Uint8Array.fromBase64`/`Uint8Array.fromHex` methods. The latter pair would throw if given a string which is not properly encoded.

## Options
## Base64 options

An options bag argument for the base64 methods allows specifying the alphabet as either `base64` or `base64url`.
Additional options are supplied in an options bag argument:

When decoding, the options bag also allows specifying `strict: false` (the default) or `strict: true`. When using `strict: false`, whitespace is legal and padding is optional. When using `strict: true`, whitespace is forbidden and standard padding (including any overflow bits in the last character being 0) is enforced - i.e., only [canonical](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5) encodings are allowed.
- `alphabet`: Allows specifying the alphabet as either `base64` or `base64url`.

- `lastChunkHandling`: Recall that base64 decoding operates on chunks of 4 characters at a time, but the input maybe have some characters which don't fit evenly into such a chunk of 4 characters. This option determines how the final chunk of characters should be handled. The three options are `"loose"` (the default), which treats the chunk as if it had any necessary `=` padding (but throws if this is not possible, i.e. there is exactly one extra character); `"strict"`, which enforces that the chunk has exactly 4 characters (counting `=` padding) and that [overflow bits](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5) are 0; and `"stop-before-partial"`, which stops decoding before the final chunk unless the final chunk has exactly 4 characters.

The hex methods do not take any options.

## Writing to an existing Uint8Array

The `Uint8Array.fromBase64Into` method allows writing to an existing Uint8Array. Like the [TextEncoder `encodeInto` method](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto), it returns a `{ read, written }` pair.

```js
let target = new Uint8Array(8);
let { read, written } = Uint8Array.fromBase64Into('Zm9vYmFy', target);
assert.deepStrictEqual([...target], [102, 111, 111, 98, 97, 114, 0, 0]);
assert.deepStrictEqual({ read, written }, { read: 8, written: 6 });
```

This method takes an optional final options bag with the same options as above, plus an `outputOffset` option which allows specifying a position in the target array to write to without needing to create a subarray.

`Uint8Array.fromHexInto` is the same except for hex.

## Streaming

There is no support for streaming. However, it is [relatively straightforward to do effeciently in userland](./stream.mjs) on top of this API, with support for all the same options as the underlying functions.
There is no explicit support for streaming. However, it is [relatively straightforward to do effeciently in userland](./stream.mjs) on top of this API, with support for all the same options as the underlying functions.

## FAQ

Expand Down Expand Up @@ -68,17 +87,17 @@ For hex, both lowercase and uppercase characters (including mixed within the sam

### How is `=` padding handled?

Padding is always generated. The base64 decoder does not require it to be present unless `strict: true` is specified; however, if it is present, it must be well-formed (i.e., once stripped of whitespace the length of the string must be a multiple of 4, and there can be 1 or 2 padding `=` characters).
Padding is always generated. The base64 decoder allows specifying how to handle inputs without it with the `lastChunkHandling` option.

### How are the extra padding bits handled?

If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bits, which means you have an extra 4 or 2 bits which don't encode anything.

Per [the RFC](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5), decoders MAY reject input strings where the padding bits are non-zero. Here, non-zero padding bits are silently ignored when `strict: false` (the default), and are an error when `strict: true`.
Per [the RFC](https://datatracker.ietf.org/doc/html/rfc4648#section-3.5), decoders MAY reject input strings where the padding bits are non-zero. Here, non-zero padding bits are silently ignored unless `lastChunkHandling: "strict"` is specified.

### How is whitespace handled?

The encoders do not output whitespace. The hex decoder does not allow it as input. The base64 decoder allows [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace) anywhere in the string as long as `strict: true` is not specified.
The encoders do not output whitespace. The hex decoder does not allow it as input. The base64 decoder allows [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace) anywhere in the string.

### How are other characters handled?

Expand Down Expand Up @@ -107,7 +126,3 @@ That's also been the consensus when it's come up [previously](https://discourse.
### What if I just want to encode a portion of an ArrayBuffer?

Uint8Arrays can be partial views of an underlying buffer, so you can create such a view and invoke `.toBase64` on it.

### What if I want to decode a Base64 or hex string into an existing Typed Array or ArrayBuffer?

While that is a reasonable things to want, I think it need not be included in the initial version of this API. We can add it later if demand proves high. Until then, copying slices of memory (e.g. using `target.set(chunk, offset)`) is quite fast.
8 changes: 4 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
"name": "proposal-arraybuffer-base64",
"scripts": {
"build-playground": "mkdir -p dist && cp playground/* dist && node scripts/static-highlight.js playground/index-raw.html > dist/index.html && rm dist/index-raw.html",
"build-spec": "mkdir -p dist/spec && ecmarkup --lint-spec --strict --load-biblio @tc39/ecma262-biblio --verbose spec.html --assets-dir dist/spec dist/spec/index.html",
"build-spec": "mkdir -p dist/spec && ecmarkup --lint-spec --strict --load-biblio @tc39/ecma262-biblio --verbose --mark-effects spec.html --assets-dir dist/spec dist/spec/index.html",
"build": "npm run build-playground && npm run build-spec",
"format": "emu-format --write spec.html",
"check-format": "emu-format --check spec.html"
},
"dependencies": {
"@tc39/ecma262-biblio": "2.1.2653",
"@tc39/ecma262-biblio": "^2.1.2663",
"ecmarkup": "^18.0.0",
"jsdom": "^21.1.1",
"prismjs": "^1.29.0"
Expand Down
39 changes: 28 additions & 11 deletions playground/index-raw.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
<script>
// logic for making codeblocks copyable
// svg from https://feathericons.com/
'use strict';
let copySVG = '<svg xmlns="http://www.w3.org/2000/svg" width="20" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather feather-copy"><rect x="9" y="9" width="13" height="13" rx="2" ry="2"></rect><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"></path></svg>';

if (navigator.clipboard) {
Expand Down Expand Up @@ -84,8 +85,8 @@ <h3>Basic usage</h3>
</code></pre>

<h3>Options</h3>
<p>The base64 methods take an optional options bag which allows specifying the alphabet as either "base64" (the default) or "base64url" (<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-5">the URL-safe variant</a>).</p>
<p>When encoding, the options bag also allows specifying <code>strict: false</code> (the default) or <code>strict: true</code>. When using <code>strict: false</code>, whitespace is legal and padding is optional. When using <code>strict: true</code>, whitespace is forbidden and standard padding (including any overflow bits in the last character being 0) is enforced - i.e., only <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-3.5">canonical</a> encodings are allowed.</p>
<p>The base64 methods take an optional options bag which allows specifying the alphabet as either <code>"base64"</code> (the default) or <code>"base64url"</code> (<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-5">the URL-safe variant</a>).</p>
<p>The base64 decoder also allows specifying the behavior for the final chunk with <code>lastChunkHandling</code>. Recall that base64 decoding operates on chunks of 4 characters at a time, but the input maybe have some characters which don't fit evenly into such a chunk of 4 characters. This option determines how the final chunk of characters should be handled. The three options are <code>"loose"</code> (the default), which treats the chunk as if it had any necessary <code>=</code> padding (but throws if this is not possible, i.e. there is exactly one extra character); <code>"strict"</code>, which enforces that the chunk has exactly 4 characters (counting <code>=</code> padding) and that <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-3.5">overflow bits</a> are 0; and <code>"stop-before-partial"</code>, which stops decoding before the final chunk unless the final chunk has exactly 4 characters.
<p>The hex methods do not have any options.</p>

<pre class="language-js"><code class="language-js">
Expand All @@ -99,22 +100,38 @@ <h3>Options</h3>
// works, despite whitespace, missing padding, and non-zero overflow bits

try {
Uint8Array.fromBase64('SGVsbG8g\nV29ybG Q=', { strict: true });
Uint8Array.fromBase64('SGVsbG8gV29ybGR=', { lastChunkHandling: 'strict' });
} catch {
console.log('with strict: true, whitespace is rejected');
console.log('with lastChunkHandling: "strict", overflow bits are rejected');
}
try {
Uint8Array.fromBase64('SGVsbG8gV29ybGQ', { strict: true });
Uint8Array.fromBase64('SGVsbG8gV29ybGQ', { lastChunkHandling: 'strict' });
} catch {
console.log('with strict: true, padding is required');
}
try {
Uint8Array.fromBase64('SGVsbG8gV29ybGR=', { strict: true });
} catch {
console.log('with strict: true, non-zero overflow bits are rejected');
console.log('with lastChunkHandling: "strict", overflow bits are rejected');
}
</code></pre>

<h3>Writing to an existing Uint8Array</h3>
<p>The <code>Uint8Array.fromBase64Into</code> method allows writing to an existing Uint8Array. Like the <a href="https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto">TextEncoder <code>encodeInto</code> method</a>, it returns a <code>{ read, written }</code> pair.</p>

<p>This method takes an optional final options bag with the same options as above, plus an <code>outputOffset</code> option which allows specifying a position in the target array to write to without needing to create a subarray.</p>

<pre class="language-js"><code class="language-js">
let target = new Uint8Array(7);
let { read, written } = Uint8Array.fromBase64Into('Zm9vYmFy', target);
console.log({ target, read, written });
// { target: Uint8Array([102, 111, 111, 98, 97, 114, 0]), read: 8, written: 6 }
</code></pre>

<p><code>Uint8Array.fromHexInto</code> is the same except for hex.</p>

<pre class="language-js"><code class="language-js">
let target = new Uint8Array(6);
let { read, written } = Uint8Array.fromHexInto('deadbeef', target);
console.log({ target, read, written });
// { target: Uint8Array([222, 173, 190, 239, 0, 0]), read: 8, written: 4 }
</code></pre>

<h3>Streaming</h3>
<p>There is no support for streaming. However, <a href="https://github.com/tc39/proposal-arraybuffer-base64/blob/main/stream.mjs">it can be implemented in userland</a>.</p>

Expand Down
Loading

0 comments on commit 8dc1667

Please sign in to comment.