-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1374896 unify structured types string representation #1882
base: master
Are you sure you want to change the base?
SNOW-1374896 unify structured types string representation #1882
Conversation
Add helper ArrowStringRepresentationBuilders that take care of converting recursive toString results into a valid json, taking logical type into accunt. Extract fetching logical type from field metadata to a separate static function, change boolean string representations to lowercase, add tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding custom to string for vectors, in case vector ever accepts types other than int and float.
...a/net/snowflake/client/core/arrow/tostringhelpers/ArrowArrayStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
...a/net/snowflake/client/core/arrow/tostringhelpers/ArrowArrayStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
Seems I haven't change all ocurrences of upper case booleans in tests, will fix in next commit |
src/test/java/net/snowflake/client/jdbc/structuredtypes/StructuredTypesGetStringBaseIT.java
Outdated
Show resolved
Hide resolved
src/test/java/net/snowflake/client/jdbc/structuredtypes/StructuredTypesGetStringBaseIT.java
Outdated
Show resolved
Hide resolved
.../net/snowflake/client/core/arrow/tostringhelpers/ArrowObjectStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
...a/net/snowflake/client/core/arrow/tostringhelpers/ArrowArrayStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
Move prefix and suffix cofiguration to the constructor of base builder, remove unnecessary comments, extract shouldQuote check to a super method, make valueType a constructor parameter for Array toString builder, fix tests failing due to the lowercase booleans
Add helper ArrowStringRepresentationBuilders that take care of converting recursive toString results into a valid json, taking logical type into accunt. Extract fetching logical type from field metadata to a separate static function, change boolean string representations to lowercase, add tests.
Move prefix and suffix cofiguration to the constructor of base builder, remove unnecessary comments, extract shouldQuote check to a super method, make valueType a constructor parameter for Array toString builder, fix tests failing due to the lowercase booleans
...a/net/snowflake/client/core/arrow/tostringhelpers/ArrowArrayStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
.../net/snowflake/client/core/arrow/tostringhelpers/ArrowObjectStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
...a/net/snowflake/client/core/arrow/tostringhelpers/ArrowArrayStringRepresentationBuilder.java
Outdated
Show resolved
Hide resolved
@@ -21,6 +24,25 @@ public Object toObject(int index) throws SFException { | |||
|
|||
@Override | |||
public String toString(int index) throws SFException { | |||
return vector.getObject(index).toString(); | |||
FieldVector vectorUnpacked = vector.getChildrenFromFields().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that there must be at least one child inside? Is get(0) safe? Is it checked somewhere before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a given for any ListVector
FieldVector vectorUnpacked = vector.getChildrenFromFields().get(0); | ||
|
||
FieldVector keys = vectorUnpacked.getChildrenFromFields().get(0); | ||
FieldVector values = vectorUnpacked.getChildrenFromFields().get(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify here that the children set contains key-children and value-children?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I belive this should always work for map vector, but I'll verify for empty one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so there might exist an object of MapVector class that does not have these children, but it seems to be a very weird case. We could either try and verify that it won't happen here (which is probably the case), or simply add a check just to be extra safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After verification, this shouldn't be empty if used properly, so we are good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to MapVector docs it seems that we're good as we're checking isSet
and
The MapVector is nullable, but if a map is set at a given index, there must be an entry.
} | ||
|
||
for (int i = vector.getElementStartIndex(index); i < vector.getElementEndIndex(index); i++) { | ||
builder.appendKeyValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keyLogicalType? I know that it could only String but it must be changed in future because database could return also Integers keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used only to make the decision on whether the value should be quoted or not. The output string is JSON-like so the key is always quoted even for Integers
} | ||
|
||
public ArrowStringRepresentationBuilderBase appendValue(String value) { | ||
addCommaIfNeeded(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you weigh the pros and cons of using StringJoiner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only considered two extreme cases of building the string manually (chosen) and using some abstraction like JSONObject
(rejected) but didn't consider StringJointer which is an option in between so I'll also take a look at it
import org.junit.AfterClass; | ||
import org.junit.BeforeClass; | ||
|
||
public abstract class BaseWiremockTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is out of scope, true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeees, it appeared here after rebase for some reason but it's a change that should already be merged I believe
|
||
@RunWith(Parameterized.class) | ||
@Category(TestCategoryResultSet.class) | ||
public class StructuredTypesGetStringArrowJsonCompatibilityIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add test for AllTypesClass structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add
Overview
SNOW-1374896
Build string representations of Snowflake structured types recursively to reuse existing converters design for specific logical types (e.g. timestamps/binary)
Code replaces the existing structured types converters implementation that was running the native
getObject
method with a solution that utilises reading a field vectors within the structured type and running a proper converter on each nested type. Changes are made to Array, Map and Struct converters, helper methods are added to ArrowVectorConverter interface and newArrowStringRepresentationBuilder
classes that abstract away the logic of actually building a string object out of the arrow structured type.Follow ups:
null
while for JSON there'sundefined
which also is some kind of divergence but not necessarily something to fix as ARROW's null sounds more reasonableexample for
SELECT [12, 10, 5, NULL]::ARRAY(DOUBLE)
Pre-review self checklist
master
branchmvn -P check-style validate
)mvn verify
and inspecttarget/japicmp/japicmp.html
)SNOW-XXXX: