Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TensorFlow TFRecord format #382

Closed
nevillelyh opened this issue Jan 3, 2017 · 2 comments
Closed

Support TensorFlow TFRecord format #382

nevillelyh opened this issue Jan 3, 2017 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@nevillelyh
Copy link
Contributor

They're just length prefixed binary strings file. Specs here.

We need custom FileBasedSink and FileBasedSource plus Scala hooks.

@nevillelyh nevillelyh added enhancement New feature or request help wanted and removed help wanted labels Jan 3, 2017
@nevillelyh
Copy link
Contributor Author

I have a working branch here: https://github.com/spotify/scio/tree/neville/tf

It includes both source and sink and support uncompressed, zlib and gzip formats, although not fully tested.
Had to use wrappers from Apache commons since there're some bugs with the JDK zip library.
And it still doesn't read zlib files properly, something related to a JVM bug 😞

@nevillelyh nevillelyh self-assigned this Jan 4, 2017
@nevillelyh
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant