Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Filestore Implementation #2634

Closed
wants to merge 209 commits into from

Conversation

kevina
Copy link
Contributor

@kevina kevina commented May 6, 2016

Closes Issue #875, avoid duplicating files added to ipfs

NOT READY FOR MERGE

Rebased #2600 on master.

Quicklinks: Code, README,

TODO to get this merged:

Note The filestore is very basic right now, but it is functional. I will likely continue to improve and submit new pull requests for the enhanced functionally but right now I fell it is important a basic implementation gets in so that it will get used, it can be labeled as an experimental feature and disabled by default, but available for those that want to use it. I consider the code production ready.

Resolves #875

@@ -19,48 +20,61 @@ type Block interface {
Loggable() map[string]interface{}
}

// Block is a singular block of data in ipfs
type RawBlock struct {
type BasicBlock struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aww, renaming this from RawBlock to BasicBlock makes the diff so noisy... I was hoping to avoid that by getting that change merged separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that change as it was in a separate commit. Let me know if you know if you want me to do anything about it, like submitting this change separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping, this rename is only in one file, it is exported because I might eventually refactor my Filestore specific changes to blocks/blocks.go into another file.

@kevina
Copy link
Contributor Author

kevina commented May 10, 2016

I imagine that this is going to take a while before it gets merged. So for now I think it will be best to maintain this as a separate fork while the goal of eventually merging everything in.

@whyrusleeping I will still like to work with you on API issues and work to get major API changes merged first to avoid stagnation.

I have created a README for the new filestore available here https://github.com/ipfs-filestore/go-ipfs/blob/kevina/filestore/filestore/README.md some notes on my fork are available here https://github.com/ipfs-filestore/go-ipfs/wiki.

@jefft0 if you are interested I could use some serious testing now. Others all of course welcome to test it out.

}

func NewReaderFile(filename, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {
return &ReaderFile{filename, path, reader, stat}
func NewReaderFile(filename, path, abspath string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably be changed so that it only accepts the absolute path instead of a filename, path and absolute path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and a similar change can be made throughout this part of the codebase)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I added absolute path, because I was unsure how FullPath was used in the code base. Is it acceptable for full path to be an absolute path? I am unsure how FullPath and "add -r" interact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, i'm not entirely sure either. But that said, our tests in that area are quite thorough, If you change something, go into test/sharness, run make deps and then run ./t0040-add.sh, and maybe some of the other t004* tests too to make sure nothing weird happened

Copy link
Contributor Author

@kevina kevina May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right I will start by trying to change FullPath to an absolute path add see what happens when used with "add -r". I will make sure the contents of the directory objects are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns our that when adding directories the directory is part of the filename. For example after adding some debug output here is the result of one of the test:

expecting success: 
        mkdir mountdir/planets &&
        echo "Hello Mars!" >mountdir/planets/mars.txt &&
        echo "Hello Venus!" >mountdir/planets/venus.txt &&
        ipfs add -r mountdir/planets >actual

Filename: planets/mars.txt
Path: mountdir/planets/mars.txt
Filename: planets/venus.txt
Path: mountdir/planets/venus.txt
ok 42 - 'ipfs add -r' succeeds

Thus we still need to pass in both a filename and a path. It seams okay to just make the path absolute, so I will do that I remove the change that adds an absolute path to the various parts in the files package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extract the filename from the passed in path though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it is not going to work. This test will fail. The filename needs to be "planets/mars.txt" if you try to extract it it will be just "mars.txt". You will get this:

expecting success: 
        mkdir mountdir/planets &&
        echo "Hello Mars!" >mountdir/planets/mars.txt &&
        echo "Hello Venus!" >mountdir/planets/venus.txt &&
        ipfs add -r mountdir/planets >actual

ok 42 - 'ipfs add -r' succeeds

expecting success: 
        PLANETS="QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY" &&
        MARS="QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN" &&
        VENUS="QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4" &&
        echo "added $MARS planets/mars.txt" >expected &&
        echo "added $VENUS planets/venus.txt" >>expected &&
        echo "added $PLANETS planets" >>expected &&
        test_cmp expected actual

> diff -u expected actual
--- expected    2016-05-11 03:20:12.742743658 +0000
+++ actual      2016-05-11 03:20:12.466738156 +0000
@@ -1,3 +1,2 @@
-added QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN planets/mars.txt
-added QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4 planets/venus.txt
-added QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY planets
+added QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN mars.txt
+added QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4 venus.txt

not ok 43 - 'ipfs add -r' output looks good
#
#               PLANETS="QmWSgS32xQEcXMeqd3YPJLrNBLSdsfYCep2U7CFkyrjXwY" &&
#               MARS="QmPrrHqJzto9m7SyiRzarwkqPcCSsKR2EB1AyqJfe8L8tN" &&
#               VENUS="QmU5kp3BH3B8tnWUU2Pikdb2maksBNkb92FHRr56hyghh4" &&
#               echo "added $MARS planets/mars.txt" >expected &&
#               echo "added $VENUS planets/venus.txt" >>expected &&
#               echo "added $PLANETS planets" >>expected &&
#               test_cmp expected actual
#

Here is a quick patch to try, it might not apply cleanly for you though.

diff --git a/commands/files/readerfile.go b/commands/files/readerfile.go
index 2508fe1..d7f5443 100644
--- a/commands/files/readerfile.go
+++ b/commands/files/readerfile.go
@@ -4,6 +4,7 @@ import (
        "errors"
        "io"
        "os"
+       gopath "path"
 )

 // ReaderFile is a implementation of File created from an `io.Reader`.
@@ -17,7 +18,11 @@ type ReaderFile struct {
        baseInfo ExtraInfo
 }

-func NewReaderFile(filename, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {
+func NewReaderFile(_, path string, reader io.ReadCloser, stat os.FileInfo) *ReaderFile {
+       filename := path 
+       if path != "" {
+               filename = gopath.Base(path)
+       }
        return &ReaderFile{filename, path, reader, stat, 0, PosInfo{0, path}}
 }

@whyrusleeping
Copy link
Member

@kevina Yeah, theres a LOT here, so its going to take some time. I don't disagree with you maintaining a fork in the meantime. That said, as I find time my plan is to pick out parts of the code that can be merged up easily. I think the next candidate for that is the commands/files code refactor to use absolute paths.

@whyrusleeping whyrusleeping added kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress labels May 13, 2016
@kevina kevina force-pushed the kevina/filestore branch 2 times, most recently from e8c4a1b to 2d09e9f Compare May 18, 2016 02:56
@kevina
Copy link
Contributor Author

kevina commented May 19, 2016

@whyrusleeping, One idea I had which might avoid some special casing is to replicate what is done with add --only-hash

                if hash {
                        nilnode, err := core.NewNode(n.Context(), &core.BuildCfg{
                                //TODO: need this to be true or all files
                                // hashed will be stored in memory!
                                NilRepo: true,
                        })
                        ...
                        n = nilnode
                }

That is create a special "node" for adding files to the filestore. This could help avoid special case code in at least a few places. It will at minimal avoid the special case code in the blockstore package as I can use my own implementation.

Do you think this is an idea worth pursuing? I image the special node will share many of the data structures from the real node to avoid any data race or related issues.

@whyrusleeping
Copy link
Member

@kevina I think that might be a good idea, my only concern is that in order to read the blocks out of the filestore would you still need that custom created node?

@kevina
Copy link
Contributor Author

kevina commented May 20, 2016

@whyrusleeping, no the custom node will be for adding only. We will still need some sort of multi-datastore for reading blocks. See my comments for #2747.

@kevina
Copy link
Contributor Author

kevina commented May 23, 2016

@whyrusleeping there is a lot going on in an Ipfs Node. What I need is not a mock node like the code that --only-hash uses but rather create a new Node that acts like a view into the original node except that add's are handled differently. I was not confident I could pull this if with a full Ipfs Node.

With that in mind, I refactored the adder code a little to bundle all the needed services into the DataServices struct see commit c41a47e. A proper interface that exports just methods needed by the adder would be nice, but I think this is a good start. This new interface could be used to simply a lot of code elsewhere, including the code for just computing the hashes, but I don't fully understand what is going on there so I left the code for add --hash-only alone.

I use this new interface to to create a DataServices view in commit e6bfbfa that has uses an alternative Blockstore and leaves the existing Blockstore and caching Blockstore wrapper alone. In commit b54f2f4 I factor out the filestore specific code from the DagServices and instead install small hooks.

In commit f9a51b1 I completely eliminated the AdvReader code you didn't like and instead computed the offset's in the DAG builder code (the Balanced builder to be specific, the Trickle builder doesn't support the filestore just yet).

All in all I cleaned up a lot of the special case code.

I you are okay with the changes in c41a47e I can separate that out into it's own pull request.

@whyrusleeping
Copy link
Member

@kevina from irc, we can move towards making the NewAdder signature be: NewAddr(context.Context, bs bstore.Blockstore, pin pinning.Pinner, ds dag.DAGService) which eliminates the need for the partial initialization stuff

@kevina
Copy link
Contributor Author

kevina commented May 31, 2016

@whyrusleeping f5fb3c9 basically does what you ask for.

None of the other methods in the measure package return this error,
instead they only call RecordValue() when the value is []byte.  This
change makes batch Put consistent with the other methods and allows
non []byte data to be passed though the measure datastore.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
The datastore has an optional "advanced" datastore that handles Put
requests for non []byte values, a "normal" datastore that handles all
other put requests, and then any number of other datastore, some of
them that can be designated read-only.  Delete requests are passed on
to all datastore not designed read-only.

For now, querying will only work on a "normal" datastore.

Note: Only tested in the case of just a "normal" datastore and the
case of an "advanced" and "normal" datastore.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
Create a new AdvReader interface that returns an ExtraInfo object that
contains extra information about the the filehandle.  This includes
the current offset and the absolute path of a file.

Also modify chunk.Splitter to return a Bytes struct in the NextBytes()
method that in addition to the raw bytes returns the ExtraInfo object
from the AdvReader that contains the current offset in the file
(amoung other information).

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
…port.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
Some Filestore clean operations are broken.  Needs more tests.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
Add --full-key option to "filestore ls" command.

Add "ls" format option to "filestore verify".

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
Also fix bug discovered in "verify".

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
This reverts commit 241d10d.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@rht
Copy link
Contributor

rht commented Jan 30, 2017

This sure is a feat, pulled singlehandedly on filestore, 👏 👏, @kevina! If this PR is carefully splitted into 10 PR's consisting of 10 commits each (if 50% have already been merged), this should be doable within a week (or two the longest).

@ipfs ipfs locked and limited conversation to collaborators Mar 4, 2017
@whyrusleeping
Copy link
Member

🙌 🎆 #3629 has been merged, basic filestore functionality is in master. Go try it out :)

@whyrusleeping whyrusleeping removed the status/in-progress In progress label Mar 6, 2017
@ipfs ipfs deleted a comment from kevina Jan 28, 2019
@ipfs ipfs deleted a comment from jefft0 Jan 28, 2019
@ipfs ipfs deleted a comment from whoizit Jan 28, 2019
@ipfs ipfs deleted a comment from jbenet Jan 28, 2019
@ipfs ipfs deleted a comment from kevina Jan 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants