Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for field layout verification across version bubble boundary #50364

Merged
merged 15 commits into from
Apr 7, 2021

Conversation

trylek
Copy link
Member

@trylek trylek commented Mar 29, 2021

When verifying field offset consistency, we shouldn't be checking
base class size when the base class doesn't belong to the current
version bubble. I have fixed this by adding a special case for
the existing fixup encoding indicated by base class size being
zero; please let me know if you find it preferable to create
a new fixup type for this purpose instead. I have also created
a simple regression test that was previously failing when run
against the framework compiled with CG2 assembly by assembly.

Thanks

Tomas

Fixes: #49982

/cc @dotnet/crossgen-contrib

@trylek trylek added this to the 6.0.0 milestone Mar 29, 2021
Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. We don't need a new encoding as we've never shipped anything that should depend on this.

@trylek
Copy link
Member Author

trylek commented Mar 30, 2021

@davidwrighton - CI testing uncovered a pre-existing deficiency of CG2 field layout, we weren't properly aligning base class sizes on ARM32. Could you please take another look when you have a chance?

@trylek
Copy link
Member Author

trylek commented Mar 30, 2021

OK, I found out there were actually three different places in Crossgen2 - auto field layout calculation, sequential field layout calculation and the field offset verification fixup - that were using three different methods to calculate the field base offset. I have unified all of them to use the same method CalculateFieldBaseOffset.

@davidwrighton
Copy link
Member

Every time anyone looks at this stuff we find more cases to consolidate. Hopefully this is it. I like the current state of the change, but of course, we need to look at the results of the outerloop testing that you are doing now.

@@ -775,13 +762,14 @@ private static bool IsByValueClass(TypeDesc type)
return type.IsValueType && !type.IsPrimitive && !type.IsEnum;
}

private static LayoutInt ComputeBytesUsedInParentType(DefType type)
public LayoutInt CalculateFieldBaseOffset(TypeDesc type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API will crash with an InvalidCast if e.g. an array is passed. It would be better to ensure valid inputs at compile time.

Suggested change
public LayoutInt CalculateFieldBaseOffset(TypeDesc type)
public LayoutInt CalculateFieldBaseOffset(MetdataType type)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks, fixed in 8th commit.

@trylek
Copy link
Member Author

trylek commented Apr 1, 2021

I have rebased the change, I reverted a failed attempt at ARM offset bias refactoring; for the sake of expediency I have also included my two other pending fixes,

#50474

and

#50562

For now I assume that we'll first merge in the two fixes and I'll then rebase this change against them and rerun the testing but I'm open to merging in this combination of the three fixes for the sake of expediency.

@mangod9
Copy link
Member

mangod9 commented Apr 1, 2021

merging together should be fine, assuming tests are passing with all of them together.

@trylek
Copy link
Member Author

trylek commented Apr 1, 2021

Down to three remaining test failures (ExplicitClass in crossgen2smoke on ARM, LayoutClassTest, PlatformDefaultMemberFunctionTest); some unrelated failures in composite CG2 builds that should be fixed or at least change their behavior after the fix for MVID checks (#50474) is merged in; I'm looking at them now but the crossgen2smoke failure on arm[32] is interesting and I would love to hear @davidwrighton's opinion:

Verify_FieldOffset 'ExplicitlySizedClass.A' Field offset 0!=-4(actual) || baseOffset 0!=0(actual)

I think this raises two interesting questions:

  1. The shortcut I used to encode extra-bubble base class sizes as zero probably incorrectly conflates this case with types without a base class. For now I think I should probably introduce a new fixup for this case after all, I don't see any non-hacky way to encode this extra bit of information in the existing fixup format.

  2. I sincerely doubt that the "actual" information is correct, I think it rather means that in this particular case the field offset check in jitinterface.cpp calculates the base size incorrectly; I guess that the base class - object - is assumed to have size 8 and the offset of the first field in ExplicitlySizedClass is 4 and that gets recalculated to the relative offset notation, otherwise it doesn't make sense to me.

Thanks

Tomas

@jkotas
Copy link
Member

jkotas commented Apr 1, 2021

Is the base validation actually useful?

  • If there is no version bubble boundary crossing, the base does not matter. All fields are accessed by absolute offset.
  • If there is version bubble boundary, the base cannot be validated.

@trylek
Copy link
Member Author

trylek commented Apr 2, 2021

I believe it's extremely useful. Please remember that these checks are not supposed to improve our code in any manner, they are just sanity checks helping us chase down subtle differences in the two type systems and class loaders that have been plaguing us throughout the entire CG2 development - each longer-term member of the CG2 v-team has their stories to tell; I side with David on this one, I believe the validation has been an awesome boost in nailing down hard issues, I recall times before that change and having to do everything manually. And this logic already bears unexpected fruits - discoveries of tiny misalignments between CG2 and the runtime regarding field layout, not only on ARM (even within a version bubble i.e. with fixed offsets). While this PR is difficult, I would be very sad to have to trim this down by reducing the quality of the checks.

@trylek
Copy link
Member Author

trylek commented Apr 2, 2021

Hmm, I guess the expression I used is somewhat ambiguous, sorry about that, I think I meant to say something like "the checks are not supposed to improve the runtime native code in any manner, ...", the way I phrased that it sounds like a contradiction in terms to some extent. Thanks for your patience :-).

@jkotas
Copy link
Member

jkotas commented Apr 2, 2021

I understand that the field offset check is useful. I am questioning the usefulness of the base check.

@davidwrighton
Copy link
Member

The particular value of the runtime base check isn't to validate them in cases where there is a cross-module boundary. Its to validate the data that would be used in a cross module boundary to ensure that the general type layout processing is correct. It turns out that while not exactly matching up to a useful value, it has been quite valuable at finding all kinds of edge cases that do affect the within version bubble cases. In particular, it is good at finding cases where the alignment calculations were subtly wrong in a way that didn't cause a field offset to be wrong in that particular test case, but could have caused a problem for other fields on the type.

Now, if it turns out the only incorrectness is caused by this incorrect conflation of cross boundary and non-deriving, we could use a different sentinel value like 0xFFFFFFFF or something or simply not allow a sentinel value for types that derive directly from object. @trylek I assume that you have an arm32 test environment where you can test this locally? Relying on the CI for this sort of testing is a giant time waster.

Also, what I've found in the past is that when a layout issue is particularly nasty is that I need to build a standalone repro of the bug, and debug in parallel through the native CLR type layout logic and the managed type layout logic and find where they differ. I worry that when you consolidated the 3 different ways in which we calculated base offset that you actually removed a way in which they are required to be very subtly different.

@trylek
Copy link
Member Author

trylek commented Apr 2, 2021

Hmm, I guess you're probably right that in the local case (within version bubble) the verification of absolute field offset is sufficient. On the other hand, the base class - derived class boundary turns out to be a super productive bug farm, I have been poring over the code for more than a week by now and I still cannot claim I exactly know what's happening in each situation, especially as there are multiple code paths involved for explicit / sequential / auto layouts for reference types vs. value types with side fun regarding the ARM32 offset bias. As I believe that "usefulness" of this fixup is basically preventively diagnostic (we detect upfront that we got it wrong), I tend to believe that the base field offset calculation does constitute a value worth validating (in the intra-bubble scenario) as getting that wrong indicates a potentially critical bug in our implementation.

@trylek
Copy link
Member Author

trylek commented Apr 2, 2021

In fact, this is exactly what crashes the LayoutClassTest - in the test, there's a class named SeqDerivedClass that derives from an empty class named EmptyBase, please cf.

public class SeqDerivedClass : EmptyBase

The proximate problem is that, on x64, we don't properly replicate runtime behavior w.r.t. the base class EmptyBase that adds 1 byte to its 8-byte object base (on x64) - I haven't yet found the place in the CoreCLR runtime method table builder where the empty class gets added an extra byte, probably akin to C++ being unable to properly deal with zero-sized classes. Long story short, my current belief is that this is what causes the remaining discrepancy until proven otherwise.

@trylek
Copy link
Member Author

trylek commented Apr 7, 2021

Merging in as the PR is green and the remaining CG2 failures are known.

@trylek trylek merged commit a7ed1fd into dotnet:main Apr 7, 2021
@trylek trylek deleted the VerifyFieldLayoutFix branch April 7, 2021 10:35
thaystg added a commit to thaystg/runtime that referenced this pull request Apr 7, 2021
* upstream/main: (568 commits)
  [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842)
  [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303)
  [wasm] Fix debug build of AOT cross compiler (dotnet#50418)
  Fix outdated comment (dotnet#50834)
  [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678)
  Vectorized common String.Split() paths (dotnet#38001)
  Fix binplacing symbol files. (dotnet#50819)
  Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735)
  Remove extraneous CMake version requirement. (dotnet#50805)
  [wasm] Remove unncessary condition for EMSDK (dotnet#50810)
  Add loop alignment stats to JitLogCsv (dotnet#50624)
  Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265)
  Avoid unnecessary closures/delegates in Process (dotnet#50496)
  Fix for field layout verification across version bubble boundary (dotnet#50364)
  JIT: Enable CSE for VectorX.Create (dotnet#50644)
  [main] Update dependencies from mono/linker (dotnet#50779)
  [mono] More domain cleanup (dotnet#50771)
  Race condition in Mock reference tracker runtime with GC. (dotnet#50804)
  Remove IAssemblyName (and various fusion remnants) (dotnet#50755)
  Disable failing test for GCStress. (dotnet#50828)
  ...
@ghost ghost locked as resolved and limited conversation to collaborators May 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants