Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize the reader and input interfaces #1533

Closed
yingsu00 opened this issue May 4, 2022 · 5 comments
Closed

Generalize the reader and input interfaces #1533

yingsu00 opened this issue May 4, 2022 · 5 comments

Comments

@yingsu00
Copy link
Collaborator

yingsu00 commented May 4, 2022

Velox will support multiple file formats like Parquet, ORC, Alpha in the future. These readers, like the DWRF reader, shall use some common components like the InputStream's, decoding and decompression utility functions, etc. Further more, they may directly inherit the ColumnReader's that were originally made for DWRF. To prep for the upcoming native Parquet reader, we propose to do the following refactoring:

1. Move SeekableInputStream, BufferredInput, (de)compressor related classes to velox::dwio::common::io

2. Move some DWRF utility headers to dwio::common #1619

3. Compatibility support with DuckDB parquet reader

4. Generalize the ColumnReader and SelectiveColumnReader classes #1620

5. Tests

The tests for the classes that moved to dwio::common will also be moved to that namespace/folders

@yingsu00 yingsu00 self-assigned this May 4, 2022
@mbasmanova
Copy link
Contributor

@frankobe
Copy link
Contributor

frankobe commented May 4, 2022

cc @liushengxuan

@yingsu00
Copy link
Collaborator Author

yingsu00 commented May 5, 2022

cc @majetideepak @aditi-pandit

@yingsu00
Copy link
Collaborator Author

yingsu00 commented May 9, 2022

@yingsu00 yingsu00 reopened this May 14, 2022
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this issue Jun 21, 2022
…ookincubator#1526)

Summary:
SeekableInputStream will be used by the column readers for different
file formats in the near future. This PR decouples the ORC RowGroup
concept from the SeekableInputStream by renaming seekToRowGroup() to
seekToPosition(), so it can be generalized for different file formats that do
not support ORC RowGroups.

This is the first PR to resolve facebookincubator#1533

Pull Request resolved: facebookincubator#1526

Reviewed By: zzhao0

Differential Revision: D36348871

Pulled By: oerling

fbshipit-source-id: ba0baf16c7951f86a5da6f51471800eed6ba3134
shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this issue Aug 18, 2022
…ookincubator#1526)

Summary:
SeekableInputStream will be used by the column readers for different
file formats in the near future. This PR decouples the ORC RowGroup
concept from the SeekableInputStream by renaming seekToRowGroup() to
seekToPosition(), so it can be generalized for different file formats that do
not support ORC RowGroups.

This is the first PR to resolve facebookincubator#1533

Pull Request resolved: facebookincubator#1526

Reviewed By: zzhao0

Differential Revision: D36348871

Pulled By: oerling

fbshipit-source-id: ba0baf16c7951f86a5da6f51471800eed6ba3134
@stale
Copy link

stale bot commented Sep 16, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 16, 2022
@stale stale bot closed this as completed Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants