Get all chunk references for a given file #1185

holisticode · 2019-02-04T01:25:19Z

This PR adds an endpoint to FileStore which allows to get a list of hashes for a given file.

holisticode · 2019-02-04T01:33:34Z

swarm/storage/filestore_test.go

+
+	// testRuns[i] and expectedLen[i] are dataSize and expected length respectively
+	testRuns := []int{1024, 8192, 16000, 30000, 1000000}
+	expectedLens := []int{1, 3, 5, 9, 248}


For the test run with a 1000000 data size, I sometimes get 247 and sometimes 248 references....

WHY IS THAT??? @zelig @nolash ?

I am only aware of one bug in the pyramidchunker, which occurs - as far as I remember, although this is awhile ago - when you have all batches in a tree filled plus one chunk. Neither 247 or 248 match the count of such a configuration.

Meanwhile, for 1000000 bytes 248 should be correct:

ceil(1000000/4096) = 245 ceil(245/128) = 2 245 data chunks + 2 pointer chunks + 1 root chunk = 248

If you can reproduce this anomaly using the same data (randomized but from a fixed seed), then perhaps we could discover which chunk goes missing, and if it's the same one.

If this is indeed flaky, alternating between 247 and 248, we must find what is going on...

nolash

I'm not sure if I understand this PR. What is the use-case for it?

nolash · 2019-02-04T10:28:23Z

swarm/storage/filestore.go

+		return nil, err
+	}
+	// collect all references
+	for _, ref := range putter.References {


Is it intentional that the references are returned in arbitrary order?

@nolash is right - we should sort the before returning them.

And what sort order should this be? By value? By hierarchy? If the latter, how to achieve that when the putter will get the hashes in no specific order?

Alphabetical - so that when we have a tool to query N nodes whether they have a list of chunks, they are all sorted in the same order and we can quickly merge and check N such lists.

OK, I will add alphabetical ascending sort order

Yes, alphabetical order would be the best.

nolash · 2019-02-04T10:30:47Z

swarm/storage/filestore.go

+// HashExplorer's Put will add just the chunk hashes to its `References`
+func (he *HashExplorer) Put(ctx context.Context, chunkData ChunkData) (Reference, error) {
+	// Need to do the actual Put, which returns the references
+	ref, err := he.hasherStore.Put(ctx, chunkData)


Oh my, it's pretty clumsy having to hash everything twice... I wonder why the pyramidsplitter only returns the data, not the reference.

@nolash not everything is hashed twice - and the points in time for the requests are separated.

We use Store when we actually really store a data structure on swarm. Then we use the "conventional" PyramidSplit, and having all references for the data structure is (currently) not needed and useless.

We use GetAllReferences for debugging, when we ask a node "do you actually (still?) have a chunk with hash abc123 in your store? Then It will be hashed again (but only by the checking node) to get all references for the given data structure, so it is at another point in time, and optionally for user-selected data structures.

Is the use case clear now?

holisticode · 2019-02-06T01:16:17Z

Reopened upstream as ethereum/go-ethereum#19002

holisticode added 2 commits February 3, 2019 19:19

swarm/storage: GetAllReferences returns all chunk references

9104193

swarm/storage: GetAllReferences test and fixes

eb2d20f

holisticode requested review from janos, nonsense and zelig February 4, 2019 01:26

holisticode added test area:db-rewrite network stability labels Feb 4, 2019

swarm/storage: with toEncrypt flag

3dc9065

holisticode commented Feb 4, 2019

View reviewed changes

holisticode requested a review from nolash February 4, 2019 01:34

nolash reviewed Feb 4, 2019

View reviewed changes

nonsense approved these changes Feb 4, 2019

View reviewed changes

swarm/storage: sorted references for GetAllReferences

7b56fd2

janos approved these changes Feb 4, 2019

View reviewed changes

swarm/storage: bug fix in GetAllReferences

ce973dc

holisticode mentioned this pull request Feb 6, 2019

swarm: Get all chunk references for a given file ethereum/go-ethereum#19002

Merged

holisticode closed this Feb 6, 2019

holisticode mentioned this pull request Feb 8, 2019

Bug when hashing a file: Returned hash count is not always the same #1211

Closed

holisticode deleted the print-hashes branch February 28, 2019 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get all chunk references for a given file #1185

Get all chunk references for a given file #1185

holisticode commented Feb 4, 2019

holisticode Feb 4, 2019 •

edited

Loading

nolash Feb 4, 2019 •

edited

Loading

nonsense Feb 4, 2019

nolash left a comment

nolash Feb 4, 2019

nonsense Feb 4, 2019

nolash Feb 4, 2019

nonsense Feb 4, 2019

holisticode Feb 4, 2019

janos Feb 4, 2019

nolash Feb 4, 2019

holisticode Feb 4, 2019

holisticode commented Feb 6, 2019

Get all chunk references for a given file #1185

Get all chunk references for a given file #1185

Conversation

holisticode commented Feb 4, 2019

holisticode Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

nolash Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holisticode commented Feb 6, 2019

holisticode Feb 4, 2019 •

edited

Loading

nolash Feb 4, 2019 •

edited

Loading