Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lua components corruption on save #5657

Merged

Conversation

Max5377
Copy link
Contributor

@Max5377 Max5377 commented Nov 1, 2023

Fixes #5653
Fix for the issue when objects which are not in space were corrupted on save, because their lua components wasn't saved.
For this, m_bodyIndex vector is used instead of m_bodies to not miss any bodies that are currently exists in the game.
Incompatible with old saves.
Update: fix index mistmatch between Json bodyArray and for loop in LuaSerializer::SaveComponents and LuaSerializer::LoadComponents when adding luaComponents object, since for loop in LuaSerializer is starting from one, but values to bodyArray is added through push_back, so they are starting at zero.
Update2: rebase.
Update3: break in for loop in LuaSerializer::SaveComponents(Json &jsonObj, Space *space) is changed to continue, because if LuaObjectBase::SerializeComponents(body, luaComponentsObj) returns false, all other objects are not get their components saved.

@Max5377 Max5377 force-pushed the fix-lua-components-corruption branch from 49cce50 to 05adccc Compare November 1, 2023 19:57
@cwyss cwyss mentioned this pull request Nov 11, 2023
@cwyss
Copy link
Contributor

cwyss commented Nov 11, 2023

I tested this and it does not seem to fix #5653, details see #5653 (comment)

src/Space.cpp Outdated
@@ -263,7 +263,11 @@ Space::Space(Game *game, RefCountedPtr<Galaxy> galaxy, const Json &jsonObj, doub
try {
Json bodyArray = spaceObj["bodies"].get<Json::array_t>();
for (Uint32 i = 0; i < bodyArray.size(); i++)
{
if(bodyArray[i]["is_not_in_space"].is_boolean())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are checking for the presence of a key-value pair and not a specific value (i.e. true or false), prefer bodyArray[i].count("is_not_in_space") > 0. The index operator on non-const maps will mutate the JSON object with a new "is_not_in_space": null entry (and throw an exception on const maps if the key doesn't exist).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to count instead of is_boolean like you suggested.

src/Space.cpp Outdated Show resolved Hide resolved
for (size_t idx = 0; idx < bodies.size(); idx++) {
Body *body = space->GetBodies()[idx];
for (size_t idx = 1; idx < bodies.size(); idx++) {
Body *body = space->GetBodyByIndex(idx);
Copy link
Member

@Web-eWorks Web-eWorks Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than starting from a 1-based index, please prefer calling GetBodyByIndex(idx + 1) with a comment indicating why it takes 1-based indicies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done because Space::RebuildBodyIndex() and Space::RebuildSystemBodyIndex() is adding nullptr as first object at the start. When doing this PR I suggested that it's used as a "bad" index, but I don't know if it's really needed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed every instance where start index was 1, instead of 0. I consider the first index of Space::RebuildBodyIndex() and Space::RebuildSystemBodyIndex() (aka m_bodyIndex and m_sBodyIndex) as a bad index.

@Max5377 Max5377 force-pushed the fix-lua-components-corruption branch from 05adccc to f65ca13 Compare November 18, 2023 08:45
@Web-eWorks
Copy link
Member

@Max5377 please ignore the idea of adding a BodyID value for now - I have a fully working prototype of the idea described above, but the way hyperspace clouds are handled at the moment cuts through so many levels of the game's systems that it is impossible to accomplish the above idea while still maintaining a clean architecture.

This PR, though it's a bit messy, is still the lowest-effort way to fix the issue at hand. Once the remaining feedback items are addressed I'd like to merge this as an interim solution until a more permanent solution can be worked on.

The rest of this comment is a technical breakdown of the underlying problem for the advanced reader.


The structure I've settled on to implement my above comment regarding bodyID is known as a "sparse set" or "packed array" - essentially it's comprised of two parts, a sparse array of object IDs that map to slots in a dense array which store the actual object. In this case, the object is a Body *, so the main gain is having stable body indices which can be queried in O(1) (unlike the current GetIndexForBody), as well as providing simple and robust "weak reference" semantics when attempting to look up a body which has been deleted.

It's those "weak reference" semantics which are the core of the problem however - Ships stored in HyperspaceClouds are both technically alive and dead in that they need to be serialized and unserialized with the rest of the bodies storied in Space, but should otherwise be completely ignored and treated as though they don't exist until they "arrive" from the hyperspace cloud at some indeterminate point.

This becomes a problem with stable body indices, because those ships need to be removed from the "active set" in Space but still maintain a lease on their ID - as well as their associated components - across both a hyperspace transition and a save-load boundary.

While the implementation I've been working on can handle that in one direction - that being a loaded ship in a hyperspace cloud will "allocate" its ID without a corresponding "commit" operation to back that ID with a Body *, it's very difficult to solve the other direction - when an existing ship enters hyperspace, it can "release" its allocation backing so it's no longer in the iterable set of bodies, but then it can no longer be looked up from its ID when a reference to it is being unserialized.

This lookup operation on bodies that have been removed from Space is most likely the source of a number of crashes we've had immediately after leaving hyperspace or loading a save, as we (de)serialize both a ship and its associated AICommand object, which has to go through this mechanism to restore its pointer to the owning ship.

The underlying problem here is that Space is trying to solve two problems at once: it's maintaining a list of "simulated" bodies, but it's also trying to provide a solution to allow stably serializing Body pointers across savegames, even if those bodies aren't managed by Space at all. This is also complicated by the existence of the BodyComponent system, which needs a way to uniquely identify and serialize components for all Bodies in existence, regardless of whether they're actively simulated.

At this point there are two directions we can go:

  1. We treat Space as a "registry" or "world" in the ECS sense, where all Bodies in existence are registered with Space and a fast equivalent of Body::IsInSpace() is used to determine which bodies should be ticked. This adds non-trivial complexity, as it needs to maintain yet another array of "active" body pointers and needs to serialize body IDs stably across savegames.
  2. We treat Space as a discrete simulation arena which contains a single list of "active bodies", and completely separate the responsibility for serializing arbitrary Body pointers into a separate serialization helper. This would effectively involve an external system for managing Body lifetimes and providing stable IDs if required.

One of my larger in-flight branches is blocked on the need to serialize multiple pointers to a LuaSharedObject (a RefCounted object with Lua exposure) and maintain pointer identity after deserialization, which could be neatly solved by option 2.

I also think option 2 is a better fit for the original purpose of Space, as well as better matching our existing code. However, it's a significantly non-trivial amount of code changes as every Body, BodyComponent, and LuaObject serialization site needs to be touched to pass around a new helper which can handle serializing object pointers in a type-erased fashion as well as provide some way to fix up circular references.

There is of course the 3rd option, which is to "pre-serialize" Ships in HyperspaceClouds down to a JSON string when they hyperspace out, and then restore them later. Unfortunately, this has many downsides - we lose the ability to query the stored ship, we have to implement a second serialization mechanism to achieve piecemeal serialization in addition to stop-the-world serialization, and we have to support incremental body insertion or rewriting to handle the case where e.g. a body index used by a serialized ship is "stolen" by a new body in the intervening time.

@Max5377 Max5377 force-pushed the fix-lua-components-corruption branch from f65ca13 to 698a1fa Compare November 20, 2023 21:29
Fix for the issue when objects which are not in space were corrupted on save, because their lua components wasn't saved.
For this, m_bodyIndex vector is used instead of m_bodies to not miss any bodies that are currently exists in the game.

Co-Authored-By: Webster Sheets <4218491+web-eworks@users.noreply.github.com>
Co-Authored-By: cwyss <1362524+cwyss@users.noreply.github.com>
@Max5377 Max5377 force-pushed the fix-lua-components-corruption branch from 698a1fa to 4ff1aac Compare November 20, 2023 21:31
@Max5377
Copy link
Contributor Author

Max5377 commented Nov 20, 2023

@Web-eWorks Yeah, I should've mentioned this in the initial comment that in this PR I was trying to find the most straight-forward solution without changing a lot of ingame systems so this can be indeed considered a temporary fix to the said problem.
I fixed remaining feedback and I think I haven't forgotten anything. But I would like to ask for one clarification.
In LuaSerializer::SaveComponents(Json &jsonObj, Space *space), LuaObjectBase::SerializeComponents(body, luaComponentsObj) can return false on this line:

// Get the LuaObject for the given LuaWrappable passed in
lua_pushlightuserdata(l, object);
lua_rawget(l, -2); // Registry, LuaObject
if (lua_isnil(l, -1)) {
	lua_pop(l, 2);
	LUA_DEBUG_END(l, 0);

	return false;
}

When testing, this happened only with HyperspaceCloud objects. Do I undestand this right that it fails, because although object is loaded in C++ code, lua is still doesn't know about this object, so it returns nil when trying to get it (aka it wasn't yet pushed to lua stack)?
This can be of course something with HyperspaceCloud class itself, but I went futher and created HyperspaceCloud.lua file in libs folder, and did this simple Constructor method:

function HyperspaceCloud:Constructor()
    if not self:hasprop("wasLoaded") then
		print("if not self:hasprop(\"wasLoaded\") then")
		self:SetComponent('HyperspaceCloud_TestComponent', {testIsTrue = true})
		self:setprop("wasLoaded", true)
		local ship = self:GetShip()
		print("Ship: ", ship and ship:GetLabel() or 'nil')
	end
end

All HyperspaceCloud objects that failed in LuaObjectBase::SerializeComponents(body, luaComponentsObj) would call their Constructor a little bit later.
As of writing this already long message I came to think that objects that failed serialization actually wasn't called their Constructor before so I'm more inclined that that's the case here, but feel free to correct me.

@Web-eWorks
Copy link
Member

When testing, this happened only with HyperspaceCloud objects. Do I undestand this right that it fails, because although object is loaded in C++ code, lua is still doesn't know about this object, so it returns nil when trying to get it (aka it wasn't yet pushed to lua stack)?

LuaComponents are stored in the uservalue (Lua table associated with a Lua userdata) of the LuaObject which "shadows" the C++ Body object and provides something for Lua to interact with. If a Body doesn't have an associated LuaObject because it was never pushed to Lua, it cannot have Lua components defined on it (as there is nothing to store them and no way for Lua to have defined them).

To my knowledge, LuaObjects are only ever destroyed when both the LuaObject has no more references and the underlying C++ object is destroyed (whether by refcount or explicit deletion), which would also preclude any potential for lua components to be destroyed before the associated Body is.

As of writing this already long message I came to think that objects that failed serialization actually wasn't called their Constructor before so I'm more inclined that that's the case here, but feel free to correct me.

Yes, to sum up, if a Body doesn't have an associated LuaObject the direct assumption is that Body has never been pushed to Lua and thus does not have Lua components to serialize.

@Web-eWorks Web-eWorks merged commit 8d54d50 into pioneerspacesim:master Nov 25, 2023
4 checks passed
@Max5377 Max5377 deleted the fix-lua-components-corruption branch January 13, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash Trader.lua
4 participants