Storage / Retrieval Deals With Partial Content #7227

hannahhoward · 2021-08-30T23:28:31Z

Checklist

This is not a new feature or an enhancement to the Filecoin protocol. If it is, please open an FIP issue.
This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the lotus forum and select the category as Ideas.
I have a specific, actionable, and well motivated feature request to propose.

Lotus component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Let's say I want to store a large existing IPLD dataset larger than a sector on Filecoin. Currently, we face several obstacles:

Right now, from a storage standpoint, the only way to store anything but a whole DAG is an offline deal
From a retrieval standpoint, we can retrieve a partial DAG via expressing a selector other than "give me the entire DAG". But there are various problems here for our large dataset:
1. We can't do this at the CLI level currently cause we lack a command line syntax for selectors.
2. Even if we could, the syntax for selectors is limited ATM -- we lack a "give me the whole DAG except the part below this CID cause I know it's in another piece" selector
3. Even if we had more powerful selectors, selectors require the retrieval client to know a-priori what the right selector is to get the part of the DAG contained in a single sector.

Let's consider what we'd like to be possible:

The person storing should be able to break up their very large DAG in arbitrary ways into a set of partial DAGs
The person retrieving should be able to just start at the root, make a retrieval, see what they get back, and then plan to make retrievals from there.

We also already have alternate storage clients like Estuary that are failing proposed deals cause they are trying to send partial DAG data to miners.

Describe the solution you'd like

Fortunately, our underlying transport protocol for data transfer, Graphsync, can serve requests where the peer sending the data only has part of the DAG expressed by the requested CID+Selector. The Graphsync responder knows how to communicate to request what it served and what it didn't, and the requestor knows how to process this information and still verify the response.

Currently, the go-data-transfer library currently fails all transfers where the entire request root + IPLD selector is not served.

I propose that we allow data transfers to complete successful for a transfer that have only serves a partial response.

My proposed bubbling up to Lotus is as follows:

go-data-transfer should emit an event on both sides to notify the calling library of a CID that was not served and was skipped over
go-data-transfer should have a new final status of PartiallyCompleted for when a transfer is done sending/receiving but the entire DAG was not served (plus possibly some additional events that put it in this state)
go-fil-markets storage client will fire a ClientEventDataTransferComplete when go-data-transfer ends in PartiallyCompleted (the same event emitted when data transfer ends in Completed) and otherwise be unchanged
go-fil-markets storage provider will fire a ProviderEventDataTransferCompleted when go-data-transfer ends in PartiallyCompleted (the same event emitted when data transfer ends in Completed) and otherwise be unchanged. The CommP calculation will be run on the received CAR file for the partial DAG and as long as it matches the Storage Proposal, the deal will continue as planned
go-fil-markets retrieval client will fire ClientEventPartiallyComplete when data transfer ends with the PartiallyCompleted status. This will trigger analogous "Partial" states for DealStatusCheckComplete and DealStatusFinalizingBlockstore, which will transition to DealStatusPartiallyComplete as the retrieval client's final status
go-fil-markets retrieval provider will fire ProviderEventPartiallyComplete when a datatransfer ends with the PartiallyCompleted status. This will move the deal to DealStatusPartiallyCompleting and then DealStatusPartiallyCompleted when CleanupDeal is finished.
at the Lotus API level, ClientRetrieve is unchanged -- it just returns statuses from retrieval client
at the CLI level, ClientRetrieve will output all retrieval statuses and a final message indicating that only partial transfer was completed

Describe alternatives you've considered

see above -- while selectors are a path forward potentially they have several limitations and the path to achieving a desirable result through them is long

Additional context

I am specifically suggesting leaving the LOTUS import process unchanged for now -- we are not trying to solve importing partial DAGs into Lotus at the moment.
Rather the client that already has a need for this functionality is Estuary, so what's ultimately most import is for Lotus to support this on the miner side, and the retrieval client side

The text was updated successfully, but these errors were encountered:

jennijuju · 2021-08-30T23:42:47Z

Cc @whyrusleeping @dirkmc @aarshkshah1992 @raulk for review

hannahhoward · 2021-08-31T15:56:45Z

I want to point out why you want this AS WELL AS very good selectors.

We already have a StopAt selector in latest versions of go-ipld-prime: ipld/go-ipld-prime#214

However, particularly in the retrieval case, the client may not know how to assemble this selector ahead of time. If I make a deal for a complex DAG with several missing pieces, for a client to retrieve this with a selector they need to know ahead of time what pieces are missing. This is pretty tricky to communicate -- or it adds overhead to discovery mechanisms.

It seems ideal to still be able to serve a "not quite complete retrieval" as a fallback

jsign · 2021-09-13T20:10:34Z

We're interested in this feature. It would make packing bigger-than-a-sectors DAGs in sector-sized deals much simpler since we don't have to deal with "complete-subdags" constraints. So, just pack the max amount of blocks possible and let the retriever know that should retrieve X deals to get the complete thing.

If doing partial retrievals makes sense for the client, so then let that be an "application" constraint that should be considered while packing things in deals; but not really mandatory.

hannahhoward added kind/feature Kind: Feature need/triage labels Aug 30, 2021

jennijuju added need/team-input Hint: Needs Team Input and removed need/triage labels Aug 30, 2021

hannahhoward self-assigned this Sep 1, 2021

This was referenced Sep 28, 2021

graphsync client should handle missing blocks on responder side ipfs/go-graphsync#180

Closed

Give missing blocks a named error ipfs/go-graphsync#227

Merged

hannahhoward removed their assignment Nov 13, 2021

rjan90 added the LM-tech-debt label Mar 23, 2023

rjan90 added this to the LM-Tech-Debt milestone Mar 31, 2023

rjan90 added the team/curio label Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage / Retrieval Deals With Partial Content #7227

Storage / Retrieval Deals With Partial Content #7227

hannahhoward commented Aug 30, 2021 •

edited

Loading

jennijuju commented Aug 30, 2021

hannahhoward commented Aug 31, 2021

jsign commented Sep 13, 2021

Storage / Retrieval Deals With Partial Content #7227

Storage / Retrieval Deals With Partial Content #7227

Comments

hannahhoward commented Aug 30, 2021 • edited Loading

Checklist

Lotus component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

jennijuju commented Aug 30, 2021

hannahhoward commented Aug 31, 2021

jsign commented Sep 13, 2021

hannahhoward commented Aug 30, 2021 •

edited

Loading