Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtle solution difference upgrading from Julia v1.6.1 --> v1.7.1 causes my iterative solver to fail #133

Closed
dmalyuta opened this issue Jan 19, 2022 · 5 comments

Comments

@dmalyuta
Copy link

dmalyuta commented Jan 19, 2022

Hello team, thanks for developing ECOS.jl. I'm writing a new package for sequential convex programming, it's called the SCP Toolbox. I struck on a very subtle issue in ECOS related to a Julia version upgrade from v1.6.1 to v1.7.1. Even though all installed package versions don't change, the behavior of ECOS changes very slightly. Because my package is iterative, it seems that a "numerically not-so-stable" unit test in my package fails simply due to the version upgrade.

First things first, I am on Ubuntu:

$ uname -r
5.11.0-40-generic
$ uname -a
Linux danylo-XPS-13-9360 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

The Julia version where things work is v1.6.1, and where things break is v1.7.1. The unit test under question is:

https://github.com/dmalyuta/scp_traj_opt/blob/bugfix/ecos-numerical-error/test/runtests.jl#L78

In particular, the test that fails occurs here:

https://github.com/dmalyuta/scp_traj_opt/blob/bugfix/ecos-numerical-error/test/examples/rendezvous_3d/tests.jl#L215

You can run the code for yourself by downloading the repository and running ] test in the Julia REPL for v1.6.1 and v1.7.1. I have also attached directly the stdout from testing both versions. You can see that the iterations follow each other very closely up until iteration 13 (of my SCP algorithm that is, not ECOS' interior point method iteration). At that point, ECOS under Julia v1.6.1 stops short with "Close to OPTIMAL" status whereas in v1.7.1 it actually finds the OPTIMAL solution. This divergence in behavior unfortunately causes the v1.6.1 version to achieve OPTIMAL on iteration 14, while v1.7.1 stops short with "NUMERICAL PROBLEMS" on iteration 14.

I think that this is an interesting bug because the package versions remain the same for both runs, only the underlying Julia language is "newer". In optimization we obviously never want to see a situation where an upgraded environment suddenly changes convergence behavior.

If you need to know something else about this issue, please let me know.

stdout_julia_v161.txt
stdout_julia_v171.txt

@mlubin
Copy link
Member

mlubin commented Jan 19, 2022

In optimization we obviously never want to see a situation where an upgraded environment suddenly changes convergence behavior.

Floating-point computations can depend on a variety of environmental factors like the compiler versions, math libraries, BLAS libraries, etc. I don't find the change in convergence behavior particularly surprising. It would be a good exercise to trace through the code in ECOS to see what causes the divergence, but my guess is that we won't find a bug here.

@odow
Copy link
Member

odow commented Jan 23, 2022

Julia's BLAS changed between 1.6 and 1.7, but ECOS_jll has no external dependencies so I'm not sure that's the problem.

There were also changes to the random number generation. Did you check that your Julia code is deterministic under Julia 1.6 and 1.7? The most likely culprit is that you aren't passing bit-for-bit identical models to ECOS between Julia 1.6 and 1.7.

@dmalyuta
Copy link
Author

dmalyuta commented Jan 23, 2022

@odow that's a good callout, maybe the inputs to ECOS are not exactly the same if my code produces slightly different outputs due to the BLAS change. Even if ECOS doesn't depend on it, my external code that wraps ECOS probably does. Where in Julia is BLAS used? Is there a list, or some other way to know, which functions call it?

@odow
Copy link
Member

odow commented Jan 23, 2022

BLAS will probably be used if you call any linear algebra-related calls. There's no easy way to isolate where and if it is called.

I think you should focus on the underlying issue: your code should be robust to these differences. You should not expect to have identical performance when changing versions or machines.

@odow
Copy link
Member

odow commented Feb 15, 2022

Closing because this doesn't seem like an issue with ECOS and there isn't any thing actionable to do here. If you can come up with a reproducible example demonstrating an issue in ECOS, please re-open.

@odow odow closed this as completed Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants