-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-export parquet-format #261
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to only expose the bits that dependencies may require? Exposing everything significantly increases the "public" surface of this crate, thereby increasing the risk of backward incompatibility if/when parquet-format changes.
This would also enable a more controlled approach to expose public APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is fine, FWIW
Wouldn't it be better to only expose the bits that dependencies may require? Exposing everything significantly increases the "public" surface of this crate, thereby increasing the risk of backward incompatibility if/when parquet-format changes.
If we know what parts of the parquet_format need to be reexported and the list is small, then exporting just those parts may make sense.
However, unless we think there is any chance that the parquet crate will someday cease to rely on parquet_format
I don't see this as adding much in the way of a public API maintenance burden.
I don't see it as a risk, because we actually want users to use the exact parquet-format version that is used on the crate. We lag on the format implementation, so there could also be some argument to make that advanced users might still need some of the structs that are exposed from the format. We're effectively making the compiled @jorgecarleitao is your preference still to expose only what we need as |
Let me try to give an example: Say we expose all interfaces ( Some time later, If we had only exposed what our users do require to interact with our crate This is what I was trying to say with the "public surface": my opinion is that if we need ABI compatibility over some dependency, because we consume that ABI, then we should definitely export that, like this PR is proposing, so that the consumers do not have depend on the same version of Does this make sense? |
It makes sense, thanks. In this case, I want to expose
I paused when I got to something like this: // lib.rs
pub mod format;
// format.rs
pub(crate) use parquet_format::*;
/// Re-export parquet_format as `parquet::format`.
///
/// Users are encouraged to use this, to avoid format mismatches.
pub use parquet_format::{
BsonType, ColumnOrder, DateType, DecimalType, EnumType, FileMetaData, IntType,
JsonType, ListType, MapType, NullType, RowGroup, SchemaElement, StringType, TimeType,
TimeUnit, TimestampType, UUIDType, TypeDefinedOrder,
}; I was trying to avoid forcing a user to do this (https://github.com/delta-io/kafka-delta-ingest/blob/587a7ca5429f985876d3f6c4492519341f141e97/Cargo.toml#L19) in order to get the We could return Maybe a solution there is to What options can you suggest going forward? I see:
|
Thanks a lot for the investigation, @nevi-me . I agree that it is not trivial :) Would it be an option to export everything on the list that you presented? E.g. create a This way we only need to export what is needed, we do not require users to pin the version to the exact one we use, and we do not need to create conversion methods. If this is too much of a complexity, let's then merge this as is :) I just think that we are making it harder for us long run, and exposing public stuff is usually a one-way street, so I was trying to reduce how much we go down that road. |
Yea, I understand this, and agree with you. May we please keep this on hold for now, I'll think of a solution based on our discussion and your suggestions. If we weren't constrained by not being able to implement traits for external types without a newtype approach, I would have preferred that we replace the Parquet equivalent structs with what's in the Parquet format crate. Defining our own structs has previously made it a bit difficult for a less experienced impelementer like me, to see what new functionality is enabled in a new parquet version. I prefer the "this broke because of a new field" kind of thing. A good example is |
Which issue does this PR close?
Closes #237 .
Rationale for this change
We might need to expose more
parquet-format
internals in future, as we already expose theFileMetadata
struct. Users who wish to use these structs sometimes need to also use theparquet-format
crate.There is a risk that users might end up importing incompatible versions.
Re-exporting this crate would make things simpler for users.
What changes are included in this PR?
parquet_format
asparquet::format
parquet::format
wherever we internally useparquet_format
Are there any user-facing changes?
We are exposing a new submodule,
parquet::format
. This is not a breaking change.