-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsck "light" #5
Comments
Are you looking for |
For that I need a copy of the data local. What I was thinking is that fsck should be possible purely with the hash information in git annex compared VS the hash information in the Glacier inventory. |
According to the |
Hmm - seems to have been a temporary problem yesterday. When I ran this the following happened:
Today it works just fine and does the FSCK against the inventory. Great stuff! |
I've done a bit more digging on this one and have figured out why this failed for me. You are currently using the InventoryDate to update the last_seen_upstream value in the cache database. As you do expect the inventory to be updated every 24 hours you assume that the value for InventoryDate will be no older than T-24 hours. The AWS documentation on this one is not clear and InventoryDate in fact only updates if there was a change to the vault (see: https://forums.aws.amazon.com/thread.jspa?threadID=106541). I have created a fix which I think will remedy this problem (please review, as I am not 100% sure about all places where this timestamp is being used). I would also recommend that the --wait parameter be added when configuring the glacier-checkpresent-hook as it will fail if that vault has not been synced to the local machine yet otherwise. This will result in the file being removed from that remote as it assumes it is not available on Glacier anymore (see my original error above). |
Thanks for your research! I didn't know the inventory would not be necessarily regenerated regularly and I appreciate the time and effort you've spent in figuring this out. last_seen_upstream is supposed to be the most recent date that we know for sure that the archive still existed in the vault. I'm not sure about setting it to I think if inventories aren't always generated regularly and this is normal behaviour, we have no option but to assume that the most recent inventory is always current. However, we also know that it takes time for a new archive to appear in the inventory. So perhaps we can still assume that an archive that hasn't appeared in an inventory after some period of time no longer exists (this is important for reconciliation if, for example, the archive is deleted from outside of glacier-cli or from a different cache), but just not treat old inventories as unusable. This would need last_seen_upstream to continue to be set to the inventory date, but some of the logic around checkpresent to change. I think you ended up with a file being removed from the remote because glacier-cli definitively reported that a file doesn't exist because the inventory was too old and thus considered invalid. AIUI, checkpresent can return a tri-state (yes, no, unknown) via an error exit and git-annex will interpret this correctly. I think I just need to revise the checkpresent logic to make this work right, without forcing I think I will need to think some more about the details here. Thanks for your work so far and I will get back to this! |
Yes, I thought I think using the job completion date will accurately reflect the information you want here. This is the date at which Glacier tells us an archive has been "last seen". I've updated the implementation in the pull request to this now, please review and see if this fits. I've tested it locally and it works but I don't know if it breaks with other logic. |
Please could I have your feedback on 0543042? I think this resolves this issue in the most robust way. I've explained the details in a comment. |
Hi, This looks good, until Amazon supplies a more robust datapoint to go by Regards, On Mon, Nov 26, 2012 at 5:54 AM, basak notifications@github.com wrote:
|
Thanks Wolfgang! Merged into master in 0543042. |
Running
git annex fsck
against the Glacier remote requests files from glacier. Would it be possible to have a "light" fsck which just compares the Glacier inventory hash with what is expected? This would be a really nice way to do the occasional check without having to retrieve all of that data.The text was updated successfully, but these errors were encountered: