Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gateway): Block and CAR response formats #8758

Merged
merged 17 commits into from
Mar 17, 2022
Merged

Conversation

lidel
Copy link
Member

@lidel lidel commented Mar 2, 2022

Summary

This PR aims to add support for requesting alternative response format via:

  • ?format= URL paramerer
  • Accept: application/vnd.ipld.{format} HTTP header

This MVP supports two formats:

  • raw – fetching single block
  • car – fetching entire DAG behind a CID as a CARv1 stream

TLDR Demo

Downloading a Block

$ curl  -H 'Accept: application/vnd.ipld.raw' "http://127.0.0.1:8080/ipfs/QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN" --output block.bin
$ cat block.bin | ipfs block put 
$ ipfs cat QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN
hello

Note: we return Content-Type: application/vnd.ipld.raw – see ipfs/in-web-browsers#191

Downloading a CAR

$ ipfs resolve -r  /ipns/webui.ipfs.io
$ curl  -H 'Accept: application/vnd.ipld.car' "http://127.0.0.1:8080/ipfs/bafybeiednzu62vskme5wpoj4bjjikeg3xovfpp4t7vxk5ty2jxdi4mv4bu" --output webui.car
$ ipfs dag import webui.car

Note:

Rationale

Why we need both

  • Verifiable HTTP Gateway Responses (Verifiable HTTP Gateway Responses in-web-browsers#128)
    • for mobile web browsers (content integrity without battery drain caused by full p2p)
    • for IoT devices and other thin clients (defaulting to HTTP, using p2p in LAN and as a fallback)
    • for hardened supply chain: we want to use CARs on CI (Github Actions), and as additional fallback logic for fetching fs-repo-migrations and similar updates
  • unlock userland processing of IPLD data
    • users will be able to fetch arbitrary blocks, even ones in formats not supported by go-ipfs

References

TODO

  • move file-specific logic to serveFile
    • Last-Modified only when Cache-Control is missing
    • Cache-Control for immutable content paths
    • add TODO about Last-Modified when UnixFS 1.5 is supported
    • run all tests on CI and fix any regressions
  • implement router for ?as= (aka ?format=)
  • implement serveBlock
  • implement serveCar
    • TBD (is content-length deterministic? blocks may have different order, but returning full DAG should always return same total of bytes) too risky
    • content-disposition (as attachment + implicit filename of {cid}.ipfs.car)
    • content-disposition (as attachment + implicit filename of {cid}.ipfs.car)
    • etag
      • car streams for bigger dags won't be byte-for-byte identical, so went with "weak" Etag here
    • cache-control: no-cache, no-transform for now
    • TBD: support selector=multibase(cbor selector)
    • content-type: application/vnd.* – need decision in Specify mime type for car ipld/go-car#238 (comment) + always include version in response + handle when processing request
    • tests - t0118-gateway-car.sh
  • create more useful metrics, deprecate unixfsGetMetric

TODO (future PRs)

After this PR lands, we can add key IPLD formats (serveDagCbor, serveDagJson) – ipfs/in-web-browsers#182

one must imagine Sisyphus happy
This is mvp which reuses http header logic from serveFile, plus custom
content-disposition to ensure browsers dont render garbage
This is PoC implementation that returns CAR as a chunked stream.
It does not set cache-control nor it has content-length.

TBD if we want/can have these things.
Copy link
Member Author

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work in progress, dropping some notes so I don't forget.

core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
@lidel lidel self-assigned this Mar 3, 2022
@BigLep BigLep linked an issue Mar 3, 2022 that may be closed by this pull request
3 tasks
@BigLep
Copy link
Contributor

BigLep commented Mar 3, 2022

2022-03-03 discussion: transporting HTTP CARs around: currently sourcing how to handle errors.

@Jorropo
Copy link
Contributor

Jorropo commented Mar 6, 2022

Personal notes for a future feature request:

Get parameters are uselessly hard to deal in many languages. Even tho most of them have escape features, you sometime manually concatenate them with the first concatenation being ? and not &.

TL;DR: I want an alternative be Accept: application-x/car header.

That also allows to save up resending the header if you do multiple car requests in a row with HTTP2.

@Jorropo
Copy link
Contributor

Jorropo commented Mar 6, 2022

TBD (is content-length deterministic? blocks may have different order, but returning full DAG should always return same total of bytes)

FYI, Getting that info require a full root-traversal (you can actually skip raw-leaves but anyway that not on point).

First CAR intricates:

In most cases, the info we trivially have is the DagSize stored in the root.
However this is not enough to know the size of the final car.
Assuming equal DagSize a car made of lots of small blocks is bigger than one made of few big blocks. Because CARv1 have a fix varuint + CID overhead per block, more blocks => more overhead even at equal dag sizes.

If we had BlockCount + DagSize we can get a really good approximation.

I do say approximation because:

  • Inlined blocks
  • CID length
  • Varuint size variability

Is variability you cannot account for without a traversal. Hopefully those are only small and doesn't change much. (Is it even ok to send wrong but close content lengths ?)

Secondly, duped blocks

If we want to be smart we don't send duped blocks multiple time, however you don't know how much of your dag size is duped that can have massive impacts on the real content size.

@lidel
Copy link
Member Author

lidel commented Mar 7, 2022

Get parameters are uselessly hard to deal in many languages.
I want an alternative be Accept: application/car header.

FWIW I've heard the same "uselessly hard" feedback about Accept header :)
Arguments being: not possible to create such request via browser's address bar, unable to create <a href= links, often being ignored by HTTP caches etc. etc.

This is to say, there is no silver bullet, too many use cases. We need to support both (query param, and Accept header). I've seen no consensus around mime-type for Block and CAR so skipped it in the initial PoC, waiting for ipld/go-car#238 to be resolved first, but I think you are right, we should support both from the start:

application/vnd.ipfs.block
application/vnd.ipfs.car
application/vnd.ipfs.car; version=1
application/vnd.ipfs.car; version=2

or to make things less ambiguous, reuse existing IPLD concepts:

application/vnd.ipld.raw
application/vnd.ipld.car
application/vnd.ipld.car; version=1
application/vnd.ipld.car; version=2

Is it even ok to send wrong but close content lengths?

I'd rather not risk this type of hackery – it would provide better UX via progress bars etc., but some overly smart HTTP clients may close the connection right after they receive the expected number of bytes, which would truncate the CAR stream.

@Jorropo
Copy link
Contributor

Jorropo commented Mar 7, 2022

Arguments being: not possible to create such request via browser's address bar, unable to create <a href= links, often being ignored by HTTP caches etc. etc.

I know, that why I think we should have both. :)

- extracted file-like content type responses to separate .go files
- Accept HTTP header with support for application/vnd.ipld.* types
  (TBD, we did not register them yet, so for illustration purpose only)
Include block and car in unixfs_get_latency_seconds for now,
so we keep basic visibility into gateway behavior until better metrics
are added by #8441
@lidel lidel force-pushed the feat/fetch-as-block-or-car branch from fa78402 to ee7b0ae Compare March 8, 2022 20:34
@lidel lidel force-pushed the feat/fetch-as-block-or-car branch from 6a51127 to 43dc5bf Compare March 9, 2022 16:31
.raw may be handled by something, depending on OS, and .bin
seems to be universially "binary file" across all systems:
https://en.wikipedia.org/wiki/List_of_filename_extensions_(A%E2%80%93E)
This test uses official CARv1 fixture from
https://ipld.io/specs/transport/car/fixture/carv1-basic/

The CAR has two dag-cbor roots, and we use one of them, which represents
a nice DAG with both dag-cbor, dag-pb and raw blocks
Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CodeQL is unhappy but that is safe because in fact the user controlled input is passed in fmt's %q formater, which leaves us with fairly safe UTF-8.

@lidel
Copy link
Member Author

lidel commented Mar 16, 2022

Related histogram metrics are added in a separate PR: #8443

@lidel lidel merged commit 4cabdfe into master Mar 17, 2022
@lidel lidel deleted the feat/fetch-as-block-or-car branch March 17, 2022 16:15
@hacdias hacdias mentioned this pull request Oct 6, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Gateway support for /ipfs/{cid}?format=car|raw|...
3 participants