-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hadoopDistCache - support all Hadoop filesystems #115
Conversation
pls do not merge yet - i need to test |
logger.debug(s"Add '$path' to dist cache") | ||
|
||
val _conf = Option(conf).getOrElse(new Configuration()) | ||
val target = new Path(self.options.getStagingLocation, path.split("/").last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "$stagingLocation/distcache/$file" for better organization?
Also the filename part should include a random hash to avoid collision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea about extra dir
. initially I was thinking we could just overwrite files, but hash is probably better.
val _conf = Option(conf).getOrElse(new Configuration()) | ||
|
||
//TODO: should we add checksums on both src and GCS to reuse uploaded artifacts? | ||
val path_hash = Hashing.sha1().hashString(path, Charsets.UTF_8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase, i.e. pathHash, targetHash, targetDistCache, etc.
Hadoop DistCache will first upload artifacts from source to GCS and use those via standard GCS based cache mechanism.
7237102
to
5915d35
Compare
👍 Can we merge? |
hadoopDistCache will first upload artifacts from source to GCS and use
those via standard GCS based cache mechanism.