sql: NaN = NaN (in Postgres) #18860

petermattis · 2017-09-28T13:52:46Z

In Postgres, NaN = NaN. This is a violation of IEEE754:

pmattis=# select 'NaN'::float = 'NaN'::float;
 ?column?
----------
 t
(1 row)

Cockroach provides the expected result

root@:26257/> select 'NaN'::float = 'Nan'::float;
+-----------------------------+
| 'NaN'::FLOAT = 'Nan'::FLOAT |
+-----------------------------+
| false                       |
+-----------------------------+
(1 row)

I'm not sure if there is anything to be done here as changing our behavior at this point would be backwards incompatible. And even if we could change it, I think the Postgres behavior is wrong. In Cockroach, NaNs can be detected using:

root@:26257/> SELECT 'NaN'::FLOAT IS NAN;
+---------------------+
| isnan('NaN'::FLOAT) |
+---------------------+
| true                |
+---------------------+
(1 row)
root@:26257/> SELECT isnan('NaN'::FLOAT);
+---------------------+
| isnan('NaN'::FLOAT) |
+---------------------+
| true                |
+---------------------+
(1 row)

The rationale for the Postgres behavior is:

Note: In most implementations of the "not-a-number" concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

In Cockroach, we allow indexing NaNs and using the index to exclude NaNs from scans just as we index NULLs, but we maintain the IEEE754 behavior of NaN != NaN.

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2017-10-05T17:54:27Z

There's some inconsistency here. Even though we define NaN != NaN, our index encoding is not aware of this like it is for NULL values. This leads to unexpected errors like the following:

root@:26257/test> INSERT INTO b VALUES ('NaN');
pq: duplicate key value (a)=(NaN) violates unique constraint "b_a_key"

We should either fix this to make it work like it does for NULL or we should follow Postgres' lead of making NaN = NaN. I'm torn about what the right choice is, but at this point, things are broken either way so I'm not sure the backward compatibility argument is very strong.

petermattis · 2017-10-05T18:28:50Z

I think the natural fix would be to treat NaN in indexes the same way we treat NULL. Allowing NaN = NaN fields too broken, but perhaps the Postgres compatibility argument wins here.

Cc @bdarnell who tends to feel more strongly about these things (both correctness and compatibility)

bdarnell · 2017-10-05T18:44:20Z

I don't feel strongly about this one. I agree with @nvanbenschoten that maintaining compatibility with past versions of cockroach is less important here since we're already kind of broken. I think ideally we'd make NaN work like NULL in an index, but I don't know how much work that would be. I'd be OK with following the postgres example and making NaN==NaN too

robert-s-lee · 2017-10-06T16:01:54Z

vote +1 to follow the Postgres convention of NaN==NaN. Please see note below on ordering behavior as well

From https://www.postgresql.org/docs/9.1/static/datatype-numeric.html

Note: IEEE754 specifies that NaN should not compare equal to any other floating-point value (including NaN). In order to allow floating-point values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

bdarnell · 2017-10-06T17:02:05Z

For comparison, mysql and sql server do not allow storage of NaN values (2010 source). Oracle implements NaN the same way postgres does, with NaN == NaN.

I don't see any DB supporting NaN in the way mandated by IEEE754 (which kind of makes sense, because this behavior is available by using NULL in place of NaN, and making a second kind of value that is unequal to itself can be complex). So now I'm leaning towards following the postgres/oracle example and deviating from IEEE754.

petermattis · 2017-10-06T18:50:12Z

NULL isn't quite a replacement for NaN because the latter can be generated as the result of computations. That said, given that we already have an inconsistency that needs to be fixed, I'm inclined to make NaN == NaN. Somewhat unfortunate that this complicates every floating point equality comparison.

While fixing this, we should make sure that NaN can be utilized in indexes appropriately. It is possible we need to transform v IS NAN to v = 'NaN'. I haven't looked at the index selection code in a long time.

We'll need to call this out in release notes when this lands.

nvanbenschoten · 2017-10-06T20:45:42Z

If both Oracle and Postgres treat NaN == NaN then I regrettably agree that we should too. I'll switch over the sql logic to support the change. Right now we transform v IS NaN -> isnan(v) at the parser level. Switching this over to a standard v IS NaN -> ComparisonExpr(IS,v,'NaN') transformation should be all we need to do to allow index selection to handle it correctly. If not, I think there may be some missing logic in index selection for IS expressions in general that we'll want to address.

Yes, this needs to be pretty loud in our release notes.

petermattis · 2017-10-06T21:12:31Z

I'm pretty sure we don't handle ComparisonExpr{Is,v,'Nan'}. See index_selection.go:722. I think we would handle ComparisonExpr{Eq,v,'NaN'}

This change adds new normalization rules: ``` ComparisonExpr{Is,v,d} where d != DNull -> AndExpr{ ComparisonExpr{EQ,v,d} ComparisonExpr{IsNot,v,DNull} } ComparisonExpr{IsNot,v,d} where d != DNull -> OrExpr{ ComparisonExpr{NE,v,d} ComparisonExpr{Is,v,DNull} } ``` This allows us to take advantage of obvious index constraints for queries like `SELECT * FROM t WHERE b IS TRUE`. It will also avoid complications being discussed in cockroachdb#18860 by addressing the root issue. There may be an argument to make this a simplification rule instead.

nvanbenschoten · 2017-10-09T22:40:14Z

To complicate this story even more, Postgres actually has NaN act like a value that is greater than every other value:

postgres=# SELECT 'NaN'::real > '+Inf'::real;
 ?column?
----------
 t
(1 row)

postgres=# SELECT 'NaN'::real < '+Inf'::real;
 ?column?
----------
 f
(1 row)

Our current behavior is (like in Go) to return false from all comparisons with NaN. Even if we did want to switch this and do something like Postgres, our encoding enforces that NaN is smaller than every other value. I think the best approach is to ignore this and have all comparisons with NaN continue to return false unless the comparison is NaN = NaN.

petermattis · 2017-10-10T00:18:09Z

Sounds good to me. Crossing my fingers we don't have to address this in the future.

nvanbenschoten · 2017-10-10T01:12:31Z

Yikes..

root@:26257/test> select f > 'NaN' from p where f > 'NaN';
+-----------+
| f > 'NaN' |
+-----------+
| false     |
| false     |
| false     |
| false     |
| false     |
+-----------+
(5 rows)

It certainly would be easier to adopt their behavior and just define NaN as less than all other values.

petermattis · 2017-10-10T01:40:35Z

That's an excellent point. The comparison of NaN in expressions needs to be consistent with the ordering of of NaN in indexes, or madness results.

nvanbenschoten · 2017-10-11T20:10:53Z

So I think I'm just going to pull this all the way back and treat NaN like Postgres does (like a normal value w.r.t comparisons). The only challenge is that we order NaN before all other values instead of after them. This can't be easily changed since it relies on our encoding scheme. I highly doubt that is something people actually rely on though, so I don't think it's worth worrying about.

Fixes cockroachdb#18860. This change switches `NaN` equality semantics from defining `NaN` as not-equal to itself to defining it as equal to itself. This is a very sad violation of IEEE754 in the name of Postgres compatibility.

robert-s-lee · 2017-10-12T20:32:23Z

below is the order Cockroach follows based on discussion with @nvanbenschoten

ASC         DESC
----        ----
NULL        +inf
NaN         3
-inf        0
-4          -4
0           -inf
3           NaN
+inf        NULL

http://troels.arvin.dk/db/rdbms/#select-order_by show Postgres, DB2 and Oracle put NULL higher than any other values and MSSQL and MySQL going the other way.

There isn't specific customers to cite at the moment raising this as an issue, but customers would expect Postgres API database should follow Postgres behavior. If we also assume CockroachDB customers are coming from Postgres, DB2 and Oracle and these then following Postgres behavior would make more sense.

Fixes cockroachdb#18860. This change switches `NaN` equality semantics from defining `NaN` as not-equal to itself to defining it as equal to itself. This is a very sad violation of IEEE754 in the name of Postgres compatibility.

petermattis added the docs-todo label Sep 28, 2017

petermattis added this to the 1.2 milestone Sep 28, 2017

nvanbenschoten self-assigned this Oct 5, 2017

nvanbenschoten mentioned this issue Oct 7, 2017

sql: improve index constraints for boolean columns #19106

Merged

nvanbenschoten mentioned this issue Oct 10, 2017

sql: redefine NaN = NaN -> true #19144

Merged

nvanbenschoten closed this as completed in #19144 Oct 30, 2017

hueypark mentioned this issue May 2, 2020

sql: revert NaN float comparisons #48211

Draft

crepererum mentioned this issue May 6, 2021

Include NaN in Parquet stats (again) apache/arrow-rs#264

Closed

cpcloud mentioned this issue Aug 10, 2022

feat(set): add basic set membership operations substrait-io/substrait#280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: NaN = NaN (in Postgres) #18860

sql: NaN = NaN (in Postgres) #18860

petermattis commented Sep 28, 2017

nvanbenschoten commented Oct 5, 2017

petermattis commented Oct 5, 2017

bdarnell commented Oct 5, 2017

robert-s-lee commented Oct 6, 2017

bdarnell commented Oct 6, 2017

petermattis commented Oct 6, 2017

nvanbenschoten commented Oct 6, 2017

petermattis commented Oct 6, 2017

nvanbenschoten commented Oct 9, 2017

petermattis commented Oct 10, 2017

nvanbenschoten commented Oct 10, 2017

petermattis commented Oct 10, 2017

nvanbenschoten commented Oct 11, 2017

robert-s-lee commented Oct 12, 2017

sql: NaN = NaN (in Postgres) #18860

sql: NaN = NaN (in Postgres) #18860

Comments

petermattis commented Sep 28, 2017

nvanbenschoten commented Oct 5, 2017

petermattis commented Oct 5, 2017

bdarnell commented Oct 5, 2017

robert-s-lee commented Oct 6, 2017

bdarnell commented Oct 6, 2017

petermattis commented Oct 6, 2017

nvanbenschoten commented Oct 6, 2017

petermattis commented Oct 6, 2017

nvanbenschoten commented Oct 9, 2017

petermattis commented Oct 10, 2017

nvanbenschoten commented Oct 10, 2017

petermattis commented Oct 10, 2017

nvanbenschoten commented Oct 11, 2017

robert-s-lee commented Oct 12, 2017