You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this requires that all SMB reads have the writer's HashFunction type available on the classpath in order to deserialize metadata.json back into a BucketMetadata. As a result, if we ever evolve the available HashFunctions and an SMB writer wants to use a new one, all of its consumers will have to upgrade to the latest Scio to be able to deserialize its BucketMetadata.
The reader only needs to know about HashFunction in order to assert that it's the same across all partitions/sources -- and we could use a String for that purpose. Should we future-proof the SMB API by serializing HashFunction explicitly as a String?
i.e.
public abstract class BucketMetadata<K1, K2, V> implements Serializable, HasDisplayData {
- @JsonProperty private final HashType hashType;+ @JsonProperty private final String hashType;+ // Used by writer code
public HashType getHashType() {
- return hashType;+ return HashType.valueOf(hashType);
}
+ // Works mostly as-is
boolean isCompatibleWith(BucketMetadata other) {
return other != null
// version 1 is backwards compatible with version 0
&& (this.version <= 1 && other.version <= 1)
- && this.hashType == other.hashType+ && this.hashType.equals(other.hashType)
// This check should be redundant since power of two is checked in BucketMetadata
// constructor, but it's cheap to double-check.
&& (Math.max(numBuckets, other.numBuckets) % Math.min(numBuckets, other.numBuckets) == 0);
}
}
Since HashType is already written using its String representation there should be no cross-version compatibility issues associated with this change -- you'd still be able to read old metadata.json files
The text was updated successfully, but these errors were encountered:
Currently, BucketMetadata is written to JSON including the HashFunction enum value of its HashType, i.e.:
However, this requires that all SMB reads have the writer's HashFunction type available on the classpath in order to deserialize
metadata.json
back into aBucketMetadata
. As a result, if we ever evolve the available HashFunctions and an SMB writer wants to use a new one, all of its consumers will have to upgrade to the latest Scio to be able to deserialize itsBucketMetadata
.The reader only needs to know about HashFunction in order to assert that it's the same across all partitions/sources -- and we could use a String for that purpose. Should we future-proof the SMB API by serializing HashFunction explicitly as a String?
i.e.
Since HashType is already written using its String representation there should be no cross-version compatibility issues associated with this change -- you'd still be able to read old metadata.json files
The text was updated successfully, but these errors were encountered: