-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShyLY: Tacho_TestSerial_double.cpp build error in intel-19.0.5 PR builds since 5/18/2022 #10549
Comments
@trilinos/framework, @trilinos/shylu FYI: This is going to fail in all PR builds that happen to enable the ShyLU_Node packages and those PR will not be able to be merged until this error is fixed on the 'develop' branch. |
FYI: It looks like this intel-19.0.5 build is also having trouble in the 'develop' to 'master' sync builds going back to 5/16/2022 as shown here: showing: Unfortunately, there is something wrong as you can't actually view the build errors. When you click on them, they come up empty. |
@ndellingwood I think that #10474 causes this failure. |
@kyungjoo-kim thanks for adding me on the issue, I'll try and help. The Tacho error in the post doesn't make sense to me, it only lists I haven't run into this issue with serial builds with intel/19 on blake. If this can't be resolved with the toolchain or environment, I can either revert #10474 and make similar changes tracked on the kokkos-promotion branch of Trilinos (otherwise we will have build failures with the kokkos develop branch) or I can try explicitly adding the |
@ndellingwood, I can't comment on that as I did not have any part in setting up this intel-19.0.5 PR build. I would hope that someone could reproduce that intel-19.0.5 build locally so the problem can be fixed. @ZUUL42, given that your PR #10537 is impacted by this intel-19.0.5 build error as well, do you know how to reproduce this build locally, like on a CEE RHEL7 machine? |
I'm testing a couple builds on a machine with sems-archive-intel/19.0.5 and will see if I can reproduce. An early observation worth sharing, |
@bartlettroscoe @ZUUL42 @kyungjoo-kim I can confirm this is toolchain problem in part by use of I tested two builds: one loading Here are more details of my builds (same cmake configuration for each): Build 1: Compilation error reproduced
Build 2: Newer gcc, no compilation error
I don't think this needs to be resolved within Tacho |
The Intel 19 master merge build is still using the same version of the build it has used for quite some time, though it has been migrated to use GenConfig. It still uses the sems-archive TPLs. For PR testing, what we are now using is an Intel 19 build that successfully uses Scotch. It uses the sems-[non-archive] TPLs. The new spack-cm SEMS modules made it possible to use Scotch with Intel 19 again. Quite a bit of back and forth and tweaking was required. As for the current state of the PR Intel 19 build you are pointing out, it appears a build error that only happens in Intel 19 was introduced between my testing and releasing it. CDash The issue you pointed out where the job uses the source's develop branch, #10538, may have contributed to that. My develop branch hasn't been updated since Feb 25. So, the issue was introduced between then and now. As for the Intel 19 Master Merge builds errors that started on Apr 29. Those must have been able to be introduced due to not having an Intel 19 PR build test in place at that time. Those errors won't be changed by anything we do with the new Intel 19 PR build. I can swap back to the Intel 17 build for PR testing. However that will not fix the master merge errors. The Intel 19 MM build has been there for quite some time and has not changed. I'm reluctant to swap back as it will potentially allow other issues to pass through PR testing and hit the Intel 19 build in MM testing. |
@ZUUL42, where are the instructions for reproducing this intel-19.0.5 build using GenConfig? The instructions at https://github.com/trilinos/Trilinos/wiki/Reproducing-PR-Testing-Errors don't seem to mention GenConfig and the script PullRequestIntel19.0.5TestingEnv.sh does not seem to be using GenConfig. I am confused how the PR testing works and how it loads the envs. Does it not use the scripts listed under cmake/std/sems/? |
IMHO, the most logical next step is that this intel-19.0.5 build needs to be removed from blocking Trilinos PR builds immediately so that we can get PRs merging again. Then the issues with this intel-19.0.5 build can be resolved offline. |
Yea, a lot can change in more than 2 months. @ZUUL42, is there not a way to introduce new Trilinos PR builds through the Trilinos PR process? That is, you don't add a new Trilinos PR build unless the Trilinos PR testing process allows it. That would ensure that this type of situation does not happen in the future. |
Can a new issue be opened for ongoing discussion of the intel-19.0.5 build (and its status as PR and/or master-merge acceptance test) for easier tracking and reference, with a link to this issue to track an item needing resolution? Regarding the build failure reported with Tacho, the problem is not with source code in Tacho though the compiler error implies this: |
The MM build uses GenConfig. The PR build does not use GenConfig. An update to the .sh script was left out of the PR. We are moving away from those with GenConfig. Synchronizing all efforts has been troublesome. As for introducing a new PR build within the PR system, that would require first adding it to the PR test set. That would prevent all other PRs from testing successfully and merging until the new build's PR is merged in. Typically that has not been necessary with the testing I do beforehand, but as you found I was working with a job that had a typo in it that I overlooked. I can revert the PR test set to Intel 17. ****** While working through this, I have discovered that sems-boost/1.74.0 appears to exist for a few gcc versions and clang/11, but not for sems-intel/19.0.5. So, it's not being loaded at all. Another thing I just found that may effect this. sems-mpich/3.2.1 is being loaded before sems-openmpi/4.0.5. openmpi is part of a default TPL set. |
Makes sense to me. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
Bug Report
@trilinos/framework, @trilinos/shylu
Description
As shown in this query showing:
the file:
is failing to build with the build error (for example, as shown here):
Specifically, this is impacting two different PRs #10533 and #10537 since 5/18/2022 and three different PR iterations:
(NOTE: The build error for the build PR-10530-test-rhel7_sems-intel-19.0.5-mpich-3.2-serial_release-debug_static_no-kokkos-arch_no-asan_no-complex_fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables-297 for PR #10530 is not related and the last PR iteration for here passed.)
If you look at the changes in the PRs #10533 and #10537 don't impact the ShyLU_Node package at all so this error must be present in Trilinos 'develop'.
Steps to Reproduce
Run the intel-19.0.5 PR build that enables the ShyLU_Node package. (I don't know if you can reproduce this error locally, I have not tired.)
The text was updated successfully, but these errors were encountered: