Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hadoopDistCache - support all Hadoop filesystems #115

Merged
merged 2 commits into from
May 10, 2016
Merged

Conversation

ravwojdyla
Copy link
Contributor

hadoopDistCache will first upload artifacts from source to GCS and use
those via standard GCS based cache mechanism.

@ravwojdyla
Copy link
Contributor Author

pls do not merge yet - i need to test local mode. Do comment if there is anything to change.

logger.debug(s"Add '$path' to dist cache")

val _conf = Option(conf).getOrElse(new Configuration())
val target = new Path(self.options.getStagingLocation, path.split("/").last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "$stagingLocation/distcache/$file" for better organization?
Also the filename part should include a random hash to avoid collision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea about extra dir. initially I was thinking we could just overwrite files, but hash is probably better.

val _conf = Option(conf).getOrElse(new Configuration())

//TODO: should we add checksums on both src and GCS to reuse uploaded artifacts?
val path_hash = Hashing.sha1().hashString(path, Charsets.UTF_8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase, i.e. pathHash, targetHash, targetDistCache, etc.

Hadoop DistCache will first upload artifacts from source to GCS and
use those via standard GCS based cache mechanism.
@ravwojdyla ravwojdyla force-pushed the hdfs_cache branch 2 times, most recently from 7237102 to 5915d35 Compare April 28, 2016 22:17
@andrewsmartin
Copy link
Contributor

👍 Can we merge?

@nevillelyh nevillelyh merged commit 0c90f00 into master May 10, 2016
@nevillelyh nevillelyh deleted the hdfs_cache branch May 10, 2016 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants