Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Cross-Platform (Node.js & Browsers) Binary #2259

Closed
mikeal opened this issue Jul 15, 2019 · 19 comments
Closed

Cross-Platform (Node.js & Browsers) Binary #2259

mikeal opened this issue Jul 15, 2019 · 19 comments
Labels
exp/wizard Extensive knowledge (implications, ramifications) required exploration P2 Medium: Good to have, but can wait until someone steps up

Comments

@mikeal
Copy link
Contributor

mikeal commented Jul 15, 2019

I’d like to document a problem we have across the js-ipfs code base and in most of the dependencies, including IPLD, and start a discussion about what we might do to resolve it.

Node.js has its own binary API, Buffer, which is not in the Browser. This means that, when we hand people Buffer instances in the browser they can’t use them with any Browser APIs. When you use the Buffer API in your code, bundlers will inject a sizable polyfill.

The Browser is kind of a mess when it comes to Binary. There’s a couple dozen types and interfaces just for representing binary, then a few more APIs that aren’t in Node.js at all for converting to/from strings (btoa, atob, TextEncoder and TextDecoder). Not to mention that there’s no out-of-the-box comparison or sorting tools.

It’s sort of an open question right now in modern JS what the “right thing” to do is. Ideally, a JS module that works with binary would:

  • Accept any binary type.
  • Return the “native” type of the platform (Uint8Array in browsers, Buffer in Node.js).
  • Avoid causing the injection of the Buffer polyfill when run through a bundler.

This is quite difficult to do by hand and there aren’t any libraries that do a decent job at solving this without writing another binary API which itself contributes to the bundle size.

I’ve come up with a new approach and a library to try and help called bytesish. The basic idea is, we can get a DataView for all Binary types without a memcopy. So the library only helps you to create, compare, and convert various binary types to DataView and, potentially, back to “native.” So the usage is something like:

const api = binary => {
  let dataView = bytesish(binary)
  /* insert code that works with binary using the DataView instance */
  return bytesish.native(dataView)
}

Using separate entry points for Node.js and the Browser it avoids adding a Buffer polyfill and has zero deps so the size it ads to a bundle is quite small.

I’d like to know what people think of this approach. If it seems like the right way forward I’ll take the time to update the IPLD libraries to follow this approach.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 16, 2019

Working with @Gozala I wrote up a better approach using DataView in order to avoid ever causing a memcopy. I’ve updated the Issue summary to reflect the change.

@alanshaw alanshaw added exp/wizard Extensive knowledge (implications, ramifications) required exploration P2 Medium: Good to have, but can wait until someone steps up labels Jul 16, 2019
@alanshaw
Copy link
Member

I really love this idea. I wonder how difficult it would be to integrate it with IPFS! My gut feeling is that it would be a lot of refactoring and even then I do not know if there are modules we depend on that we wouldn't be able to convert which would mean we'd end up with a Buffer polyfill in the bundle anyway.

We always return Buffer instances in IPFS (even in the browser). I'd be up for switching to returning "native" buffer.

In the Filecoin JS client we return "native" buffer (example) so I'll definitely use it there.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 16, 2019

My gut feeling is that it would be a lot of refactoring

It’s definitely a lot of refactoring, but I wonder how much would actually be in IPFS rather than the IPLD codecs. There shouldn’t be that many places where IPFS actually does something other than pass around binary, it mostly works with the deserialized form of the data.

Another question that I don’t know the answer to is, how does libp2p handle binary? Because that would also need a refactor if it doesn’t accept Uint8Array’s and/or uses the Buffer API for anything.

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

I would like to point out that at large Uint8Array and Buffer are mostly interchangeable (In fact browser Buffer polyfill is subclass of Uint8Array with extra methods). Only times it does break is if node module attempts to call a Buffer method not available in the Uint8Array, which in my experience had only being for to / from string conversation (multi-format stuff does that a lot).

I personally think it would be most effective to just switch to Uint8Array and patch third party dependencies (assuming there are some) to remove Buffer assumption and substitute method calls to with corresponding function calls.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 16, 2019

Here as some problems that I can see with normalizing to Uint8Array and avoiding Buffer altogether (getting rid of bytes.native()).

  • If you standardize on Uint8Array and people pass it around without calling bytes.native() on it then you can easily pass it to other Node.js libraries that will work sometimes and not other times. This is because, as you note, Buffer is a subclass of Uint8Array so most of the API works, but mostly working means sometimes broken :)

  • Node.js libraries, many of which are already written and won’t change, use Buffer API methods that aren’t on Uint8Array quite often. There’s not that many of them, but they are heavily used, .toString() being the biggest one. Since libraries can function differently when bundled and when not bundled I find the best approach here to always return the “native” type.

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

I could be very naive in thinking that if there is a paths towards switching from Buffer to Unit8Array and we work with community we can make it happen, but that is what I hoped for.

At the end of the day you could still call toBuffer(bytes) before passing to libs that insist on Buffer, but that at least pushes towards the reduction of such cases. But yeah knowing when it’s safe to do and when it isn’t is not ideal

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

What about other idea. What if instead of polyfilling browsers we polyfill node instead. Polyfill would just load in node and attach missing methods to Unit8Array so it’s API compatible with buffer.

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

I guess problem with Buffer polyfill for node is that slice method is conflicting with subtle difference that would be hard to catch :(

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

@mikeal I think best solution would be if bytesish had no canonical representation. And instead exposed Bytes constructor that is just alias to Uint8Array in browser and and Buffer constructor (with Uint8Array compatible signature) in node. That way construction across browser / node could be compatible with Uint8Array But browser would get Uin8Array and node would get Buffer. Code that will use bytesish functions will work across node / browser out of the box. While third party node code that assumes Buffer will also continue to work. And best thing is there will be no need to convert between representations.

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

It is also worth considering that if some library within the dependencies depends on Buffer you will still wind up with it in the bundle. Which is to say unless we don’t try to get rid of Buffer universally we won’t really reduce bundle size but on the contrary.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 16, 2019

I guess problem with Buffer polyfill for node is that slice method is conflicting with subtle difference that would be hard to catch :(

Yup. Buffer.slice() doesn’t cost anything because it returns you another view of the data. As a result, people use it a lot because there is no memcopy. But TypedArray slice does a memcopy and has a very different performance profile.

TBH, I feel like we’ve spent the last 5 years in the JS ecosystem trying to make Uint8Array and Buffer co-exist or be swappable and it hasn’t turned out all that well. Working with DataView is quite nice and supported in all of these platforms without any dependencies or polyfills. I also don’t think that developers have a great sense of when they are working with memory allocations or views of existing allocations, this hasn’t been messaged well to developers at all, and DataView feels a bit more explicit about what it is and can hopefully do a better job of this.

It is also worth considering that if some library within the dependencies depends on Buffer you will still wind up with it in the bundle.

Sure, but you have to start somewhere.

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

Yup. Buffer.slice() doesn’t cost anything because it returns you another view of the data. As a result, people use it a lot because there is no memcopy. But TypedArray slice does a memcopy and has a very different performance profile.

Uint8Array has .subarray which does what Buffer slice does

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

TBH, I feel like we’ve spent the last 5 years in the JS ecosystem trying to make Uint8Array and Buffer co-exist or be swappable and it hasn’t turned out all that well. Working with DataView is quite nice and supported in all of these platforms without any dependencies or polyfills.

Is this in response to my proposed solution #2259 (comment) ?

I honestly also think that attempts had being pretty fruitful. Buffers work out of the box in browsers and Uin8Array’s work as replacements in node unless library does transcoding of bytes & all is left to have an alternative for Buffer methods available so there’s an option to be Buffer free.

Problem with DataView is it’s less likely to become a drop-in replacement and requires more back and forth as most browser APIs return either Uint8Array’s or ArrayBuffer’s and node assumes Buffer’s

@Gozala
Copy link
Contributor

Gozala commented Jul 16, 2019

To be clear I’m not against DataView I just think Uin8Array has a higher chance to succeed

@vmx
Copy link
Member

vmx commented Jul 17, 2019

Please read this as a simple comment, I don't have strong opinions about it and other folks have a way better understanding of the JavaScript ecosystem to know which things are actual practical and which are not. I trust others on this decision more than myself.

Update: I wrote this without refreshing my Browser, so all I wrote was before @Gozala chimed in who has a similar view on things as I have.


I'm unsure if I want to have Buffers on Node,js and TypedArrays in the Browser. This sounds like it would introduce a new class of possible bugs. Whenever I work on js-ipld/ipfs I always hope that if it works on Node.js that it also works in the Browser automatically. If it doesn't, it's a huge pain.

Hence I'd be in favour of going full modern JS and not using Node.js Buffers. I see IPFS/IPLD JS implementations sweet spot in Browsers, not on the server-side. That's what we have the Go implementation for. Therefore I'd really like to do a Browser-first approach and also make sure that all our libraries work on the Browser (and tested there).

For users of our libraries that work with Node.js Buffers, we could have recommended libraries (or our own) to convert from Buffers to/from Uint8Arrays.

BTW, TextEncoded/Decoder is part of the Node.js API.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 17, 2019

BTW, TextEncoded/Decoder is part of the Node.js API.

Ya, but it’s not global like it is in the Browser, so most browser code that uses it still fails :(

@vmx
Copy link
Member

vmx commented Jul 17, 2019

Ya, but it’s not global like it is in the Browser,

It is in Node.js 12:

$ node
Welcome to Node.js v12.6.0.
Type ".help" for more information.
> new TextEncoder()
TextEncoder { encoding: 'utf-8' }

@mikeal
Copy link
Contributor Author

mikeal commented Jul 17, 2019

It is in Node.js 12:

Finally! Once 12 goes LTS this will be a big help.

@achingbrain
Copy link
Member

We removed node Buffers from our APIs with #3220 (though some underlying modules still return them) and only support Node 12+.

I'm going to close this because the approach we've settled on is to use Uint8Arrays everywhere (where the code is under our control).

Please reopen if you think this needs further discussion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/wizard Extensive knowledge (implications, ramifications) required exploration P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

5 participants