-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: incorrect behavior when a non-trivial series of moves are happening #11931
Comments
Related to #11921. |
@trilinos/tpetra |
@csiefer2 thanks for the comment! |
@fnrizzi Apologies for misreading your issue. I'll take a closer look. |
@csiefer2 no worries! as written above, this issue pops up from a very recent sha and also a quite older one so it doees not look like anything new |
@fnrizzi My first attempt to translate this into a unit test produced the correct answer (75 instead of 1). I'm going to go try your example against an installed trilinos to see if I can reproduce the error. |
Can you paste here the unit test? Just curious |
@fnrizzi When I your exact code against installed Trilinos, I get:
|
Yea that outrageous number appeared to me too , I don't remember when exactly but I saw that as well. Also, please note that above there are 2 shas that give me the wrong behavior. And one sha is pretty old , I think a year ago or so |
I got the above using my VOTD checkout of develop. And now my unit test code, cut and paste into an external example, generates the results I see above. Weird. |
Yes weird indeed... btw the tentative solution I posted above worked and that seems to indicate it has something to do with the teuchos rcp inside the DistObject base class. Hopefully that helps as smaller reproducer |
The bad MultiVector is probably off in uninitialized memory, maybe valgrind will give some useful info about what's going on |
As per my discussions with @cwpearson, it appears that BlockVector does not std::move correctly and that this may be the root of your issue. |
@brian-kelley The test basically points to old stack variables, so valgrind isn't helpful |
To expand a bit, When you move your registry, you're moving a A Trilinos/packages/teuchos/core/src/Teuchos_RCPDecl.hpp Lines 1287 to 1297 in 2f8614c
and then passes that NON-OWNING rcp to the Trilinos/packages/tpetra/core/src/Tpetra_BlockMultiVector_def.hpp Lines 90 to 91 in 2960af5
I think the comment on line 91 is wrong, because A fair number of the Trilinos/packages/tpetra/core/src/Tpetra_BlockMultiVector_decl.hpp Lines 219 to 231 in 2960af5
So when I'm not sure exactly why this hasn't cropped up before, except that if people have been using A big-picture solution might be be to make |
The long-term solution I prefer is replacing the constructors with something that takes an RCP of a map and always storing internal shared data with RCPs. We could band-aid the whole thing with something simpler, however. What's the real end goal for this registery thing? |
The use case I reported above reproduces exactly what i need: an instance of the registry is something that stores the data and then this registry instance is moved to another object (a Foo instance) who would then own it permanently and manager its lifetime. Specifically, the real use case is for implementing nonlinear solvers: The registry stores a bunch of operators and things we need. This registry is then moved to the actual nonlinear solver object that would own it. Here is a snippet: does this clarify? |
@fnrizzi Yeah, it does, though if if I were implementing it I would use some kind of reference-counted pointer rather than rely on std::move doing what you hope for (std::move does not play nicely with pointers). Like I said, I can band-aid this (and will try to do so today). But I want to warn you that you're doing something very few people do (and that we don't test), so there's a very real chance this won't be the only bug like this you see. |
@csiefer2 thanks for the reply!
Why std::move should not play nice with pointers? You means with Teuchos RCPs specifically or in general?
yeah that is fine, if I can help finding them, am happy to :) |
In general. If you std::move something that a pointer is pointing to without also changing the pointer, then your pointer is now pointing to who knows what.
Yeah, they're not well tested at all (to some degree because nobody uses them). The Block stuff is the least likely to work correctly, since that keeps a lot of references in addition to smart pointers. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
Bug Report
@csiefer2
Description
i am trying to create a "registry" (a struct) holding instances of tpetra block vectors and then move the registry object to another object
Foo
that owns that registry and use it inside of it. See below the usecase.if i create the registry directly and move it to the constructor of
Foo
, all extents printed make sense.If i create an instance of
Foo
by callingcreateFoo
, then things do not work.prints:
Notice how in the second part we get
DUMMY-tpetr-A ext = 1
instead of 75.Steps to Reproduce
Smaller reproducer and a solution that works
I was able to do this:
smaller reproducer: taking all the relevant classes from source code and strip out everything that is not needed
tpetra_moving_bug_reproducer.txt
This code prints the same as above.
one solution to this: i was able to make things work if I change the way the map is stored inside the
DistObject
class.if i store it by value instead of using the rcp, things work:
tpetra_moving_bug_fix.txt and this prints:
The text was updated successfully, but these errors were encountered: