Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: NaN = NaN (in Postgres) #18860

Closed
petermattis opened this issue Sep 28, 2017 · 14 comments
Closed

sql: NaN = NaN (in Postgres) #18860

petermattis opened this issue Sep 28, 2017 · 14 comments
Assignees
Milestone

Comments

@petermattis
Copy link
Collaborator

In Postgres, NaN = NaN. This is a violation of IEEE754:

pmattis=# select 'NaN'::float = 'NaN'::float;
 ?column?
----------
 t
(1 row)

Cockroach provides the expected result

root@:26257/> select 'NaN'::float = 'Nan'::float;
+-----------------------------+
| 'NaN'::FLOAT = 'Nan'::FLOAT |
+-----------------------------+
| false                       |
+-----------------------------+
(1 row)

I'm not sure if there is anything to be done here as changing our behavior at this point would be backwards incompatible. And even if we could change it, I think the Postgres behavior is wrong. In Cockroach, NaNs can be detected using:

root@:26257/> SELECT 'NaN'::FLOAT IS NAN;
+---------------------+
| isnan('NaN'::FLOAT) |
+---------------------+
| true                |
+---------------------+
(1 row)
root@:26257/> SELECT isnan('NaN'::FLOAT);
+---------------------+
| isnan('NaN'::FLOAT) |
+---------------------+
| true                |
+---------------------+
(1 row)

The rationale for the Postgres behavior is:

Note: In most implementations of the "not-a-number" concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

In Cockroach, we allow indexing NaNs and using the index to exclude NaNs from scans just as we index NULLs, but we maintain the IEEE754 behavior of NaN != NaN.

@petermattis petermattis added this to the 1.2 milestone Sep 28, 2017
@nvanbenschoten nvanbenschoten self-assigned this Oct 5, 2017
@nvanbenschoten
Copy link
Member

There's some inconsistency here. Even though we define NaN != NaN, our index encoding is not aware of this like it is for NULL values. This leads to unexpected errors like the following:

root@:26257/test> INSERT INTO b VALUES ('NaN');
pq: duplicate key value (a)=(NaN) violates unique constraint "b_a_key"

We should either fix this to make it work like it does for NULL or we should follow Postgres' lead of making NaN = NaN. I'm torn about what the right choice is, but at this point, things are broken either way so I'm not sure the backward compatibility argument is very strong.

@petermattis
Copy link
Collaborator Author

I think the natural fix would be to treat NaN in indexes the same way we treat NULL. Allowing NaN = NaN fields too broken, but perhaps the Postgres compatibility argument wins here.

Cc @bdarnell who tends to feel more strongly about these things (both correctness and compatibility)

@bdarnell
Copy link
Contributor

bdarnell commented Oct 5, 2017

I don't feel strongly about this one. I agree with @nvanbenschoten that maintaining compatibility with past versions of cockroach is less important here since we're already kind of broken. I think ideally we'd make NaN work like NULL in an index, but I don't know how much work that would be. I'd be OK with following the postgres example and making NaN==NaN too

@robert-s-lee
Copy link
Contributor

vote +1 to follow the Postgres convention of NaN==NaN. Please see note below on ordering behavior as well

From https://www.postgresql.org/docs/9.1/static/datatype-numeric.html

Note: IEEE754 specifies that NaN should not compare equal to any other floating-point value (including NaN). In order to allow floating-point values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

@bdarnell
Copy link
Contributor

bdarnell commented Oct 6, 2017

For comparison, mysql and sql server do not allow storage of NaN values (2010 source). Oracle implements NaN the same way postgres does, with NaN == NaN.

I don't see any DB supporting NaN in the way mandated by IEEE754 (which kind of makes sense, because this behavior is available by using NULL in place of NaN, and making a second kind of value that is unequal to itself can be complex). So now I'm leaning towards following the postgres/oracle example and deviating from IEEE754.

@petermattis
Copy link
Collaborator Author

NULL isn't quite a replacement for NaN because the latter can be generated as the result of computations. That said, given that we already have an inconsistency that needs to be fixed, I'm inclined to make NaN == NaN. Somewhat unfortunate that this complicates every floating point equality comparison.

While fixing this, we should make sure that NaN can be utilized in indexes appropriately. It is possible we need to transform v IS NAN to v = 'NaN'. I haven't looked at the index selection code in a long time.

We'll need to call this out in release notes when this lands.

@nvanbenschoten
Copy link
Member

If both Oracle and Postgres treat NaN == NaN then I regrettably agree that we should too. I'll switch over the sql logic to support the change. Right now we transform v IS NaN -> isnan(v) at the parser level. Switching this over to a standard v IS NaN -> ComparisonExpr(IS,v,'NaN') transformation should be all we need to do to allow index selection to handle it correctly. If not, I think there may be some missing logic in index selection for IS expressions in general that we'll want to address.

Yes, this needs to be pretty loud in our release notes.

@petermattis
Copy link
Collaborator Author

I'm pretty sure we don't handle ComparisonExpr{Is,v,'Nan'}. See index_selection.go:722. I think we would handle ComparisonExpr{Eq,v,'NaN'}

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 7, 2017
This change adds new normalization rules:
```
ComparisonExpr{Is,v,d} where d != DNull ->
  AndExpr{
    ComparisonExpr{EQ,v,d}
    ComparisonExpr{IsNot,v,DNull}
  }
ComparisonExpr{IsNot,v,d} where d != DNull ->
  OrExpr{
    ComparisonExpr{NE,v,d}
    ComparisonExpr{Is,v,DNull}
  }
```

This allows us to take advantage of obvious index constraints
for queries like `SELECT * FROM t WHERE b IS TRUE`. It will
also avoid complications being discussed in cockroachdb#18860 by addressing
the root issue.

There may be an argument to make this a simplification rule instead.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 8, 2017
This change adds new normalization rules:
```
ComparisonExpr{Is,v,d} where d != DNull ->
  AndExpr{
    ComparisonExpr{EQ,v,d}
    ComparisonExpr{IsNot,v,DNull}
  }
ComparisonExpr{IsNot,v,d} where d != DNull ->
  OrExpr{
    ComparisonExpr{NE,v,d}
    ComparisonExpr{Is,v,DNull}
  }
```

This allows us to take advantage of obvious index constraints
for queries like `SELECT * FROM t WHERE b IS TRUE`. It will
also avoid complications being discussed in cockroachdb#18860 by addressing
the root issue.

There may be an argument to make this a simplification rule instead.
@nvanbenschoten
Copy link
Member

To complicate this story even more, Postgres actually has NaN act like a value that is greater than every other value:

postgres=# SELECT 'NaN'::real > '+Inf'::real;
 ?column?
----------
 t
(1 row)

postgres=# SELECT 'NaN'::real < '+Inf'::real;
 ?column?
----------
 f
(1 row)

Our current behavior is (like in Go) to return false from all comparisons with NaN. Even if we did want to switch this and do something like Postgres, our encoding enforces that NaN is smaller than every other value. I think the best approach is to ignore this and have all comparisons with NaN continue to return false unless the comparison is NaN = NaN.

@petermattis
Copy link
Collaborator Author

Sounds good to me. Crossing my fingers we don't have to address this in the future.

@nvanbenschoten
Copy link
Member

Yikes..

root@:26257/test> select f > 'NaN' from p where f > 'NaN';
+-----------+
| f > 'NaN' |
+-----------+
| false     |
| false     |
| false     |
| false     |
| false     |
+-----------+
(5 rows)

It certainly would be easier to adopt their behavior and just define NaN as less than all other values.

@petermattis
Copy link
Collaborator Author

That's an excellent point. The comparison of NaN in expressions needs to be consistent with the ordering of of NaN in indexes, or madness results.

@nvanbenschoten
Copy link
Member

So I think I'm just going to pull this all the way back and treat NaN like Postgres does (like a normal value w.r.t comparisons). The only challenge is that we order NaN before all other values instead of after them. This can't be easily changed since it relies on our encoding scheme. I highly doubt that is something people actually rely on though, so I don't think it's worth worrying about.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 11, 2017
Fixes cockroachdb#18860.

This change switches `NaN` equality semantics from defining `NaN` as not-equal
to itself to defining it as equal to itself. This is a very sad violation of
IEEE754 in the name of Postgres compatibility.
@robert-s-lee
Copy link
Contributor

below is the order Cockroach follows based on discussion with @nvanbenschoten

ASC         DESC
----        ----
NULL        +inf
NaN         3
-inf        0
-4          -4
0           -inf
3           NaN
+inf        NULL

http://troels.arvin.dk/db/rdbms/#select-order_by show Postgres, DB2 and Oracle put NULL higher than any other values and MSSQL and MySQL going the other way.

There isn't specific customers to cite at the moment raising this as an issue, but customers would expect Postgres API database should follow Postgres behavior. If we also assume CockroachDB customers are coming from Postgres, DB2 and Oracle and these then following Postgres behavior would make more sense.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 25, 2017
Fixes cockroachdb#18860.

This change switches `NaN` equality semantics from defining `NaN` as not-equal
to itself to defining it as equal to itself. This is a very sad violation of
IEEE754 in the name of Postgres compatibility.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 30, 2017
Fixes cockroachdb#18860.

This change switches `NaN` equality semantics from defining `NaN` as not-equal
to itself to defining it as equal to itself. This is a very sad violation of
IEEE754 in the name of Postgres compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants