Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust encoding of parametric types #3938

Closed
martint opened this issue Nov 10, 2015 · 8 comments
Closed

Adjust encoding of parametric types #3938

martint opened this issue Nov 10, 2015 · 8 comments

Comments

@martint
Copy link
Contributor

martint commented Nov 10, 2015

The introduction of integer-kinded type parameters to support types such as VARCHAR(n) or DECIMAL(p, s), requires using parenthesis for encoding type parameters, which conflict with the current notation based on angle brackets.

Thus, the encoding for some common types would be:

VARCHAR(10)
DECIMAL(10, 5)
ARRAY(BIGINT)
ARRAY(VARCHAR(10))
MAP(VARCHAR(10), BIGINT)

For some hypothetical type that takes both type-kinded and integer-kinded parameters, it would look like this:

SOMETYPE(BIGINT, 10)

The implementation should be fairly straightforward, but we need to come up with a plan to deprecate the legacy syntax.

@cberner
Copy link
Contributor

cberner commented Nov 10, 2015

How would row types be encoded in this format? Current we use ROW<BIGINT, VARCHAR>("bigint_field_name", "varchar_field_name")

@martint
Copy link
Contributor Author

martint commented Nov 11, 2015

Row types are different, and we should probably special-case them (they are "record" types from a type theory point of view, not parametric types), and the encoding should mimic SQL's way of doing it:

ROW(foo VARCHAR(10), bar BIGINT)

martint referenced this issue in Teradata/presto Nov 11, 2015
Round brackets in literal calculation were not
supported properly in type signature parsing.
Round bracket count was added in parsing method that
fixes the issue.
@losipiuk
Copy link
Contributor

We are happy to work on that.

Initial work would be to make it possible to use angle brackets and parentheses interchangeably for type parameters.

And when this is in rebase decimal patch on top of that.
This should probably include unifying literal and type parameters in TypeSignature. Replacing

    private final List<TypeSignature> parameters;
    private final List<Object> literalParameters;

with single list of some variant type (TypeParameter?) which can be either LITERAL or TYPE.

cc: @sopel39, @ilfrin

@martint
Copy link
Contributor Author

martint commented Nov 12, 2015

@losipiuk, yeah, that's more or less how I was thinking about it.

BTW, I'd constrain the uses of angle brackets to support only TYPE parameter types. I.e., this should not be allowed: VARCHAR<10>.

literalParameters are currently used to encode field names for row types, so initially, I wouldn't touch those. Instead, I'd expand parameters to support TYPE and BIGINT.

@pnowojski
Copy link
Member

I have started working on this issue

@pnowojski
Copy link
Member

Do we want to maintain backward compatibility in ClientTypeSignature class? We could rename existing field typeArguments to typeParamters and for the time being serialize typeArguments using getTypeParametersAsTypeSignatures method from my pull request. Or am I missing something? What do you think? Do we even care about this?

@pnowojski
Copy link
Member

During rebasing with @dain's patch, we have realized some problems with ambiguity of TypeSignature parsing. How should we parse following signature: foo(T, BAR, VARCHAR(X))? VARCHAR is obviously supposed to be the standard type named VARCHAR, but what about T, BAR and X? All of them could be:

  • user defined type
  • template type parameter
  • template literal parameter

and without context we can not determine which. Luckily during parsing signature of functions types/arguments, we have such context - template type parameters are explicitly stated, like this:

    public ArrayContains()
    {
        super(FUNCTION_NAME, ImmutableList.of(comparableTypeParameter("T")), StandardTypes.BOOLEAN, ImmutableList.of("array<T>", "T"));
    }

By introducing some bigintParameter (similar to comparableTypeParameter) we could parse
foo(T, BAR, VARCHAR(20)) for function signature using such construct:

    public SomeFunction()
    {
        super(FUNCTION_NAME, ImmutableList.of(comparableTypeParameter("T"), bigintParameter("BAR"), StandardTypes.BOOLEAN, ImmutableList.of("foo(T, BAR, VARCHAR(20))"));
    }

We could use this context information about T and BAR and pass it as an argument to TypeSignature::parseSignature. However that doesn't solve parsing problem for JSON serialization using jackson. We could solve this by either

  1. drop usage of parseSignature in jackson deserialization of TypeSignature and create some native JSON constructor, where every field of TypeSignature is serialized separately with some meta informations that would distinguish type parameter, template type parameter and template literal parameter.
  2. assume that no template type/literal parameters are ever going to be serialized to/from JSON. It seems like there is no need to pass to workers/clients information about incomplete type. Coordinator could resolve concrete type and replace T, BAR and X with concrete values before passing it over the network.

CC @losipiuk

@martint
Copy link
Contributor Author

martint commented Mar 16, 2016

This is done

@martint martint closed this as completed Mar 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants