Fix for field layout verification across version bubble boundary #50364

trylek · 2021-03-29T14:55:08Z

When verifying field offset consistency, we shouldn't be checking
base class size when the base class doesn't belong to the current
version bubble. I have fixed this by adding a special case for
the existing fixup encoding indicated by base class size being
zero; please let me know if you find it preferable to create
a new fixup type for this purpose instead. I have also created
a simple regression test that was previously failing when run
against the framework compiled with CG2 assembly by assembly.

Thanks

Tomas

Fixes: #49982

/cc @dotnet/crossgen-contrib

davidwrighton

I like it. We don't need a new encoding as we've never shipped anything that should depend on this.

trylek · 2021-03-30T00:05:00Z

@davidwrighton - CI testing uncovered a pre-existing deficiency of CG2 field layout, we weren't properly aligning base class sizes on ARM32. Could you please take another look when you have a chance?

trylek · 2021-03-30T18:54:00Z

OK, I found out there were actually three different places in Crossgen2 - auto field layout calculation, sequential field layout calculation and the field offset verification fixup - that were using three different methods to calculate the field base offset. I have unified all of them to use the same method CalculateFieldBaseOffset.

davidwrighton · 2021-03-30T19:57:07Z

Every time anyone looks at this stuff we find more cases to consolidate. Hopefully this is it. I like the current state of the change, but of course, we need to look at the results of the outerloop testing that you are doing now.

MichalStrehovsky · 2021-03-31T07:29:23Z

src/coreclr/tools/Common/TypeSystem/Common/MetadataFieldLayoutAlgorithm.cs

@@ -775,13 +762,14 @@ private static bool IsByValueClass(TypeDesc type)
            return type.IsValueType && !type.IsPrimitive && !type.IsEnum;
        }

-        private static LayoutInt ComputeBytesUsedInParentType(DefType type)
+        public LayoutInt CalculateFieldBaseOffset(TypeDesc type)


This API will crash with an InvalidCast if e.g. an array is passed. It would be better to ensure valid inputs at compile time.

Suggested change

public LayoutInt CalculateFieldBaseOffset(TypeDesc type)

public LayoutInt CalculateFieldBaseOffset(MetdataType type)

Good point, thanks, fixed in 8th commit.

trylek · 2021-04-01T14:56:22Z

I have rebased the change, I reverted a failed attempt at ARM offset bias refactoring; for the sake of expediency I have also included my two other pending fixes,

#50474

and

#50562

For now I assume that we'll first merge in the two fixes and I'll then rebase this change against them and rerun the testing but I'm open to merging in this combination of the three fixes for the sake of expediency.

mangod9 · 2021-04-01T14:59:12Z

merging together should be fine, assuming tests are passing with all of them together.

trylek · 2021-04-01T18:35:59Z

Down to three remaining test failures (ExplicitClass in crossgen2smoke on ARM, LayoutClassTest, PlatformDefaultMemberFunctionTest); some unrelated failures in composite CG2 builds that should be fixed or at least change their behavior after the fix for MVID checks (#50474) is merged in; I'm looking at them now but the crossgen2smoke failure on arm[32] is interesting and I would love to hear @davidwrighton's opinion:

Verify_FieldOffset 'ExplicitlySizedClass.A' Field offset 0!=-4(actual) || baseOffset 0!=0(actual)

I think this raises two interesting questions:

The shortcut I used to encode extra-bubble base class sizes as zero probably incorrectly conflates this case with types without a base class. For now I think I should probably introduce a new fixup for this case after all, I don't see any non-hacky way to encode this extra bit of information in the existing fixup format.
I sincerely doubt that the "actual" information is correct, I think it rather means that in this particular case the field offset check in jitinterface.cpp calculates the base size incorrectly; I guess that the base class - object - is assumed to have size 8 and the offset of the first field in ExplicitlySizedClass is 4 and that gets recalculated to the relative offset notation, otherwise it doesn't make sense to me.

Thanks

Tomas

jkotas · 2021-04-01T23:42:41Z

Is the base validation actually useful?

If there is no version bubble boundary crossing, the base does not matter. All fields are accessed by absolute offset.
If there is version bubble boundary, the base cannot be validated.

trylek · 2021-04-02T00:06:52Z

I believe it's extremely useful. Please remember that these checks are not supposed to improve our code in any manner, they are just sanity checks helping us chase down subtle differences in the two type systems and class loaders that have been plaguing us throughout the entire CG2 development - each longer-term member of the CG2 v-team has their stories to tell; I side with David on this one, I believe the validation has been an awesome boost in nailing down hard issues, I recall times before that change and having to do everything manually. And this logic already bears unexpected fruits - discoveries of tiny misalignments between CG2 and the runtime regarding field layout, not only on ARM (even within a version bubble i.e. with fixed offsets). While this PR is difficult, I would be very sad to have to trim this down by reducing the quality of the checks.

trylek · 2021-04-02T00:13:25Z

Hmm, I guess the expression I used is somewhat ambiguous, sorry about that, I think I meant to say something like "the checks are not supposed to improve the runtime native code in any manner, ...", the way I phrased that it sounds like a contradiction in terms to some extent. Thanks for your patience :-).

jkotas · 2021-04-02T00:34:29Z

I understand that the field offset check is useful. I am questioning the usefulness of the base check.

davidwrighton · 2021-04-02T00:45:14Z

The particular value of the runtime base check isn't to validate them in cases where there is a cross-module boundary. Its to validate the data that would be used in a cross module boundary to ensure that the general type layout processing is correct. It turns out that while not exactly matching up to a useful value, it has been quite valuable at finding all kinds of edge cases that do affect the within version bubble cases. In particular, it is good at finding cases where the alignment calculations were subtly wrong in a way that didn't cause a field offset to be wrong in that particular test case, but could have caused a problem for other fields on the type.

Now, if it turns out the only incorrectness is caused by this incorrect conflation of cross boundary and non-deriving, we could use a different sentinel value like 0xFFFFFFFF or something or simply not allow a sentinel value for types that derive directly from object. @trylek I assume that you have an arm32 test environment where you can test this locally? Relying on the CI for this sort of testing is a giant time waster.

Also, what I've found in the past is that when a layout issue is particularly nasty is that I need to build a standalone repro of the bug, and debug in parallel through the native CLR type layout logic and the managed type layout logic and find where they differ. I worry that when you consolidated the 3 different ways in which we calculated base offset that you actually removed a way in which they are required to be very subtly different.

trylek · 2021-04-02T00:49:18Z

Hmm, I guess you're probably right that in the local case (within version bubble) the verification of absolute field offset is sufficient. On the other hand, the base class - derived class boundary turns out to be a super productive bug farm, I have been poring over the code for more than a week by now and I still cannot claim I exactly know what's happening in each situation, especially as there are multiple code paths involved for explicit / sequential / auto layouts for reference types vs. value types with side fun regarding the ARM32 offset bias. As I believe that "usefulness" of this fixup is basically preventively diagnostic (we detect upfront that we got it wrong), I tend to believe that the base field offset calculation does constitute a value worth validating (in the intra-bubble scenario) as getting that wrong indicates a potentially critical bug in our implementation.

trylek · 2021-04-02T00:59:42Z

In fact, this is exactly what crashes the LayoutClassTest - in the test, there's a class named SeqDerivedClass that derives from an empty class named EmptyBase, please cf.

runtime/src/tests/Interop/LayoutClass/LayoutClassTest.cs

Line 16 in c24bf80

public class SeqDerivedClass : EmptyBase

The proximate problem is that, on x64, we don't properly replicate runtime behavior w.r.t. the base class EmptyBase that adds 1 byte to its 8-byte object base (on x64) - I haven't yet found the place in the CoreCLR runtime method table builder where the empty class gets added an extra byte, probably akin to C++ being unable to properly deal with zero-sized classes. Long story short, my current belief is that this is what causes the remaining discrepancy until proven otherwise.

When verifying field offset consistency, we shouldn't be checking base class size when the base class doesn't belong to the current version bubble. I have fixed this by adding a special case for the existing fixup encoding indicated by base class size being zero; please let me know if you find it preferable to create a new fixup type for this purpose instead. I have also created a simple regression test that was previously failing when run against the framework compiled with CG2 assembly by assembly. Thanks Tomas

trylek · 2021-04-07T10:34:05Z

Merging in as the PR is green and the remaining CG2 failures are known.

* upstream/main: (568 commits) [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842) [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303) [wasm] Fix debug build of AOT cross compiler (dotnet#50418) Fix outdated comment (dotnet#50834) [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678) Vectorized common String.Split() paths (dotnet#38001) Fix binplacing symbol files. (dotnet#50819) Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735) Remove extraneous CMake version requirement. (dotnet#50805) [wasm] Remove unncessary condition for EMSDK (dotnet#50810) Add loop alignment stats to JitLogCsv (dotnet#50624) Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265) Avoid unnecessary closures/delegates in Process (dotnet#50496) Fix for field layout verification across version bubble boundary (dotnet#50364) JIT: Enable CSE for VectorX.Create (dotnet#50644) [main] Update dependencies from mono/linker (dotnet#50779) [mono] More domain cleanup (dotnet#50771) Race condition in Mock reference tracker runtime with GC. (dotnet#50804) Remove IAssemblyName (and various fusion remnants) (dotnet#50755) Disable failing test for GCStress. (dotnet#50828) ...

trylek added the area-crossgen2-coreclr label Mar 29, 2021

trylek added this to the 6.0.0 milestone Mar 29, 2021

trylek requested a review from davidwrighton March 29, 2021 14:55

davidwrighton approved these changes Mar 29, 2021

View reviewed changes

MichalStrehovsky reviewed Mar 31, 2021

View reviewed changes

davidwrighton mentioned this pull request Apr 1, 2021

Simplify mibc usage in the build #50536

Merged

trylek force-pushed the VerifyFieldLayoutFix branch from 8d19e62 to eac8901 Compare April 1, 2021 14:48

trylek force-pushed the VerifyFieldLayoutFix branch from cf657fe to b5d2731 Compare April 2, 2021 21:00

runfoapp bot mentioned this pull request Apr 5, 2021

Various failures in rolling build for readytorun/crossgen2/crossgen2smoke/crossgen2smoke.sh #50752

Closed

trylek added 14 commits April 6, 2021 00:08

Fix base type alignment calculation on ARM

ff09d66

Unify calculation of field base offset in Crossgen2

66d4608

More fixes for base type size vs. field offset base

1334047

Additional ARM-specific field layout verification fixes

a0de1c2

Revert failed attempt at refactoring ARM layout; include MVID fix

95fe35f

Stricter typing (TypeDesc -> MetadataType) per Michal's PR feedback

970283b

Fix instance size calculation for sequential / explicit layout

a180e11

Restore incorrectly removed logic for non-zero struct sizes

d196751

Simplify and fix alignment logic on x86

cdb4b0c

Put back align8 support; cleanup empty class management

db2146e

Fix OffsetBias for object; improve handling of indeterminate types

2c9bbd4

More fixes regarding FieldBaseOffset vs. the actual field base offset

8db3511

More consistency fixes for x86 and arm

943cd4d

trylek force-pushed the VerifyFieldLayoutFix branch from 1cf876a to 943cd4d Compare April 5, 2021 22:09

Two more fixes for x86 - don't 8-align base & offset bias fix

bf43f05

trylek merged commit a7ed1fd into dotnet:main Apr 7, 2021

trylek deleted the VerifyFieldLayoutFix branch April 7, 2021 10:35

GrabYourPitchforks mentioned this pull request Apr 9, 2021

Obsolete SuppressIldasmAttribute and remove ildasm.exe support for it #50951

Merged

trylek mentioned this pull request Apr 19, 2021

Test failure tracing\\eventpipe\\processenvironment\\processenvironment\\processenvironment.cmd #51477

Closed

ghost locked as resolved and limited conversation to collaborators May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for field layout verification across version bubble boundary #50364

Fix for field layout verification across version bubble boundary #50364

trylek commented Mar 29, 2021

davidwrighton left a comment

trylek commented Mar 30, 2021

trylek commented Mar 30, 2021

davidwrighton commented Mar 30, 2021

MichalStrehovsky Mar 31, 2021

trylek Apr 1, 2021

trylek commented Apr 1, 2021

mangod9 commented Apr 1, 2021

trylek commented Apr 1, 2021

jkotas commented Apr 1, 2021

trylek commented Apr 2, 2021

trylek commented Apr 2, 2021

jkotas commented Apr 2, 2021

davidwrighton commented Apr 2, 2021

trylek commented Apr 2, 2021

trylek commented Apr 2, 2021

trylek commented Apr 7, 2021

	public LayoutInt CalculateFieldBaseOffset(TypeDesc type)
	public LayoutInt CalculateFieldBaseOffset(MetdataType type)

Fix for field layout verification across version bubble boundary #50364

Fix for field layout verification across version bubble boundary #50364

Conversation

trylek commented Mar 29, 2021

davidwrighton left a comment

Choose a reason for hiding this comment

trylek commented Mar 30, 2021

trylek commented Mar 30, 2021

davidwrighton commented Mar 30, 2021

MichalStrehovsky Mar 31, 2021

Choose a reason for hiding this comment

trylek Apr 1, 2021

Choose a reason for hiding this comment

trylek commented Apr 1, 2021

mangod9 commented Apr 1, 2021

trylek commented Apr 1, 2021

jkotas commented Apr 1, 2021

trylek commented Apr 2, 2021

trylek commented Apr 2, 2021

jkotas commented Apr 2, 2021

davidwrighton commented Apr 2, 2021

trylek commented Apr 2, 2021

trylek commented Apr 2, 2021

trylek commented Apr 7, 2021