Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"None" in collection is translated to an empty value instead of null value #201

Closed
nyh opened this issue Jan 3, 2023 · 6 comments · Fixed by #241
Closed

"None" in collection is translated to an empty value instead of null value #201

nyh opened this issue Jan 3, 2023 · 6 comments · Fixed by #241
Assignees

Comments

@nyh
Copy link

nyh commented Jan 3, 2023

This issue exists also in the Datastax driver, so it should probably be fixed in both but I'm reporting it here for now.

Normally, passing the Python value None as a bind value for a prepared statement passes the CQL null value - i.e., a value with length=-1.

However, it seems that when we bind a collection type, and it includes a None as one of the elements, e.g., the list ['hello', None], the driver mistakenly converts the None not to null but to an empty value (value with length=0) which is probably not what we want.

For example, consider that we have a table with schema p int primary key, v list<text>, and do:

stmt = cql.prepare(f"INSERT INTO {table} (p,v) VALUES ({p}, ?)")
cql.execute(stmt, [['hello',None]])

What we'd expect this to do is to attempt to to insert null into the list, basically the same thing as the unprepared statement INSERT INTO {table} (p,v) VALUES ({p}, ['hi', null]). This attempt should be refused by the server, because nulls elements not allowed in collections. But what the driver does instead is to send the empty value for this None. In the case of the string this is nothing else than an empty string - so basically Scylla is told to write an empty string... Instead of the write failing, it succeeds and if we now do cql.execute(f"SELECT * FROM {table} WHERE p={p}")) the result is the silly [(p, ['hello',''])].

scylladb/scylladb#12425 contains two tests for this in Scylla's test framework which uses the Python driver to make requests. It demonstrates the situation described above for the string type (causing an empty string to be written instead of reporting an error) and also for integer type (in this case, Cassandra gets confused by the empty integer and returns some strange internal error).

@nyh
Copy link
Author

nyh commented May 30, 2023

Opened upstream (Datastax python driver) issue - https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/PYTHON-1355

@fruch
Copy link

fruch commented Jul 3, 2023

@Lorak-mmk @avelanarius, have you seen this issue ?

@avelanarius
Copy link

For reference, Java Driver 3.x has a client-side check that disallows NULL values in collections. Rust Driver allows users to send either NULL (by passing empty Option) or empty (via CqlValue::Empty).

I agree we should change it in Python Driver. Java Driver 3.x approach seems reasonable. However, changing it could potentially break some existing user code, but I guess this is a very rare usecase.

This will be the place we'll add the additional check:

for item in items:
itembytes = subtype.to_binary(item, inner_proto)
buf.write(pack(len(itembytes)))
buf.write(itembytes)
return buf.getvalue()

@avelanarius
Copy link

avelanarius commented Jul 5, 2023

...although now I've been made aware (in https://github.com/scylladb/scylla-dtest/commit/e2cc40b4835f6699afd35de9d237a8083b4a5532) that there's one usecase where a NULL in list/tuple is allowed:

UPDATE table SET value=5 WHERE k=2 IF value in ?

In this case it's OK to send (3, NULL, 5) as bind marker in IF. So for this usecase Java Driver 3.x is actually too strict (it would disallow it).

Lorak-mmk added a commit to Lorak-mmk/python-driver that referenced this issue Jul 13, 2023
Fixes scylladb#201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
Lorak-mmk added a commit to Lorak-mmk/python-driver that referenced this issue Jul 14, 2023
Fixes scylladb#201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
It also adds an integration test that checks new behavior.
Lorak-mmk added a commit to Lorak-mmk/python-driver that referenced this issue Jul 18, 2023
Fixes scylladb#201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
It also adds an integration test that checks new behavior.
fruch pushed a commit that referenced this issue Jul 18, 2023
Fixes #201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
It also adds an integration test that checks new behavior.
@nyh
Copy link
Author

nyh commented Jul 19, 2023

Mentioned in the upstream issue, https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/PYTHON-1355, that this was solved in the ScyllaDB fork. I think it should be done upstream - there's nothing Scylla-specific about this patch.

Lorak-mmk added a commit to Lorak-mmk/python-driver that referenced this issue Jul 25, 2023
Fixes scylladb#201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
It also adds an integration test that checks new behavior.
@Lorak-mmk
Copy link

I opened this PR in upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants