-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelized dgetrf doesn't detect matrix as singular #4505
Comments
I cannot immediately reproduce this with 0.3.26 (after disabling the code that enforces single-threading of course), what is your hardware please ? |
The hardware is a VirtualBox virtual machine running Ubuntu 20.04 LTS on top of Windows 11 host. CPU is Intel Core i7-10510U and hardware virtualization is being used. |
Thank you - that should be Haswell as far as OpenBLAS is concerned (unless your vbox setup hides the avx2 capability), |
Appears to be a loss of precision somewhere in the Sandybridge kernels... |
Here are the cpuinfo flags: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities ...so it doesn't hide the avx2. |
The old version may have used a different GEMM kernel, and GETRF behaviour depends on its unroll factor - which seems to explain the discrepancy I've found with Sandybridge. In one case, the workload gets split up again where it appears the problem comes in. |
Interestingly, the plain unoptimized Fortran code of the "netlib" Reference-LAPACK implementation does not recognize this particular matrix as singular either. And so far I have not come across any other (trivially) singular matrix that similarly slips through either codes. This is looking more and more like a weird corner case of machine precision (and/or compiler code generation) to me. |
Maybe this is more like an issue due to limited floating point precision rather than a real problem. I understand that in parallel versions, calculations can happen in different order and the usual mathematical rules of associativity for example do not necessarily apply. I encountered the matrix on a circuit nodal analysis problem that was incorrectly presented, and it should indeed be singular, since the problem presentation was incorrect and important elements were missing from the matrix. |
Probably yes - and it is interesting that even the simple single-threaded Fortran code of the Reference LAPACK does not treat this matrix as singular on my machine, so compiler version or the math libraries on your system may play a role here as well. Not sure if I want to calculate the determinant by hand, but I don't think I see any obviously interdependent rows or columns that would make it unequivocally singular. (Octave probably uses either OpenBLAS or the Reference BLAS/LAPACK so is no independent proof) |
It is known that small roundoff errors can lead to different outcomes of LU. (1) GETRF can fail on an ill-conditioned matrix (info > 0) because perturbations lead to a zero pivot and (2) GETRF runs to completion on a matrix that ought to have a zero pivot, but that is perturbed to a non-zero. This does not indicate an issue with LU. In my opinion, it is not a requirement for correctness that all implementations of LU (algorithmic variants, serial/parallel, different architectures) reach the exact same result. If the task is to determine the rank, I would suggest to using rank-revealing QR decomposition. |
I have the following matrix:
...which should be singular. At least getting its inverse in GNU Octave shows that it's singular to machine precision.
If I try to do LU decomposition for it using OpenBLAS:
...it prints info 0. However, if I set the environment variable OPENBLAS_NUM_THREADS to 1, it prints 14.
So it incorrectly detects the matrix as decomposable if I run it with many threads, and only detects it correctly as singular if I run it with one thread. The separator seems to be 3 vs 4 threads: with 3 threads it prints info 14 but with 4 threads it prints info 0.
I have OpenBLAS 0.3.8. I understand that in newer versions of OpenBLAS, small matrices are always handled with only one thread so it improves performance, so testing the issue with a new OpenBLAS isn't even possible. However, since the calculation happens differently depending on the number of threads, I think there's an issue somewhere that should be tracked down. I would expect OpenBLAS to give identical results no matter how many threads are in use.
The text was updated successfully, but these errors were encountered: