Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent crashes and data corruptions #1

Closed
jkotas opened this issue Sep 22, 2017 · 19 comments
Closed

Intermittent crashes and data corruptions #1

jkotas opened this issue Sep 22, 2017 · 19 comments

Comments

@jkotas
Copy link

jkotas commented Sep 22, 2017

https://github.com/HelloKitty/Reinterpret.Net/blob/master/src/Reinterpret.Net/Array/ArrayHackByteConversionExtensions.cs

This hack is corrupting the internal garbage collector data structures. It will lead to intermittent crashes and data corruption. Hacking internal garbage collector data structures like this is absolutely not supported by the .NET runtime.

@HelloKitty
Copy link
Owner

HelloKitty commented Sep 22, 2017

@jkotas Are you able to provide a succinct repro of crashes? It's possible my unit tests don't last long enough to encounter corruption or crashes. Let me look into reproing it myself but if you have something that is capable of doing so it would greatly help investigating how to address the issue!

@HelloKitty
Copy link
Owner

If you're referring to crashes or data corruption related to referencing or interacting with the object's original reference then this is a documented behavior of the library. I try my best to warn in the documentation and the readme that references to the original should be considered dead and invalid.

@jkotas
Copy link
Author

jkotas commented Sep 22, 2017

Here is a simple repro:

using System;
using Reinterpret.Net;

class Program
{
    static void Main(string[] args)
    {
        object[] a = new object[1000];

        for (int i = 0; i < a.Length; i++)
        {
            byte[] b = new byte[i];
            string s = b.ReinterpretToString();
            a[i] = s;
            GC.Collect();
        }
    }
}

You should see crash in the GC.Collect call.

There is no way to "fix" this problem. The idea behind this hack is fundamentally flawed. You will be never able to make it work 100% reliable.

The right solution for the no-copy reinterpretation of data is Span<T> that is being worked on.

@HelloKitty
Copy link
Owner

@jkotas You're right! This causes a crash. However I can't attribute it to the runtime or GC directly. I had made a poor decision to avoid size verification. Providing an array of bytes with a length % 2 != 0 causes this crash.

I will reintroduce array size verification. I should have originally verified the behavior in these cases but I had made the assumption that users who wanted performance were in a state that they were very sure things would just work.

I can tell that you do not like the idea of this library, however you have made a vital contribution to it and I appreciate it!

Span<T> is an interesting datastructure someone else had mentioned to me. However it seems incompatible with existing APIs, unless work is being done for it to be treatable as an array at runtime, and currently turning it into an array requires an allocation. This is problematic for some usecases like Unity3D.

If Span plans support for .NET2.0 that's great but until that happens I think this library has a place. If it can avoid crashes.

If you are able to reproduce any other crashes after I merge and push out the new nuget let me know! I'll keep this issue open for today.

HelloKitty added a commit that referenced this issue Sep 22, 2017
Fixed issue #1 where reinterpreting arrays to element type T where bytes
length % sizeof(T) != 0.

Bytes now have to be properly sized to be converted. This results in
slower performance but avoids crashes.
@HelloKitty
Copy link
Owner

The fix that protects against the crashes due to length mismatching is in the NuGet version 1.1.0.

@jkotas
Copy link
Author

jkotas commented Sep 23, 2017

Here is another example that will lead to a quick crash (run release build for best results):

using System;
using System.Threading;
using Reinterpret.Net;

class Program
{
    static void Main(string[] args)
    {
        new Thread(Churn).Start();
        new Thread(Work).Start();
        for (;;) GC.Collect();
    }

    static volatile ulong[] a = new ulong[1];
    static volatile byte[] b;

    static void Churn()
    {
        for(;;) a = new ulong[1];
    }

    static void Work()
    {
        for (;;) b = a.ReinterpretWithoutPreserving();
    }
}

Let me repeat: The hack that you are using is fundamentally flawed. You will be never able to make this hack work reliably. I know what I am talking about - I am .NET runtime architect.

Span<T> is an interesting datastructure someone else had mentioned to me. However it seems incompatible with existing APIs, unless work is being done for it to be treatable as an array at runtim

I agree that it is not ideal that Span cannot be passed into existing APIs that take arrays. .NET team spent a lot of time on this problem and we were not able to find a good solution for it.

@HelloKitty
Copy link
Owner

Let me repeat: The hack that you are using is fundamentally flawed. You will be never able to make this hack work reliably. I know what I am talking about - I am .NET runtime architect.

I know that you're an MS .NET developer and I absolutely respect your authority on this subject matter but I can't just abandon the library without exhausting all options first.

I am very aware that code will cause crashes but this is expected and documented in the readme. The xml documentation should be warning people who call it that it's unsafe and that all references to the original should be considered dead or invalid. Calling ReinterpretWithoutPreserving multiple times on the same array isn't supported and messing with or changing anything about the array while doing so isn't supported. Especially introducing race conditions that could end up with who knows what in the object header afterwards.

Really this unsafe/dangerous operation has a very narrow usecase for reinterpreting byte chunks from I/O such as from sockets or etc. One of the most expensive operations in serializing tended to be reading and writing collections. This is a fantastic solution for that IF you are careful and follow afew safety precautions.

I've not implied in the readme or documentation that this reinterpreting without preservation of the original is safe or can be done without regard. It's very explicit that the end-user must know they essentially own the array and that the original won't be referenced at all.

As long as those guidelines are followed I don't think you'll encounter crashes. There are some existing closed protocols, particularly the World of Warcraft protocol in my case, that require this functionality to be viable in .NET. due to the fact that in densely populated areas they send large diffs of unit states/attributes that logically represent all different types of primitive data but all encoded in a single byte chunk. Being able serialize and deserialize these diffs at high speeds is critical to make up for other design defects or performance loss emulating in managed code.

Another interesting usecase is deserializing the color pixels for an image quickly. While I don't ever expect this to be in the standard library I do expect personally to make use of this in a couple of places. I know it's unsafe, dangerous but I also do not intend to use it without regard and I hope nobody consuming this library does either.

@jkotas
Copy link
Author

jkotas commented Sep 23, 2017

Calling ReinterpretWithoutPreserving multiple times on the same array isn't supported

It is not hard to modify my example to call ReinterpretWithoutPreserving just once for every array and still crash instantly.

As long as those guidelines are followed I don't think you'll encounter crashes

This is not correct. These operations will always cause intermittent crashes or data corruptions. There is no way to use them correctly. The only variable is probability how often these crashes happen. It will be different for different programs, but never zero (except for trivial programs where the garbage collector does not ever kick in).

My example is using multiple threads to make the crash to happen fast. The same crashes will happen even in single threaded programs, just with lower probability.

For security sensitive environments like webservers, this hack is likely similar class of vulnerability as C++ malloc use-after-free bug.

@HelloKitty
Copy link
Owner

@jkotas Alright, I shall investigate and try to reproduce this under those circumstances. If I can do that then I'll remove that feature from the library. There isn't really much of a library here though without that feature so I really hope I can't.

I really appreciate your input and feedback though! I do want to provide a functional library, if it will cause crashes under those conditions then I'll have to get rid of it. I need to reproduce it somehow though or else I'll feel uneasy just discarding it because it seemed really cool and useful.

@jkotas
Copy link
Author

jkotas commented Sep 23, 2017

Here is an example that calls ReinterpretWithoutPreserving exactly once (this one seems to crash faster on debug build). One thread is producing arrays of arrays and triggers GC, second thread is consuming the arrays and calls ReinterpretWithoutPreserving on each of them once.

using System;
using System.Threading;
using Reinterpret.Net;

class Program
{
    static void Main(string[] args)
    {
        new Thread(Work).Start();

        for(;;)
        {
            var t = new ulong[1000000][];
            for (int i = 0; i < t.Length; i++)
                t[i] = new ulong[1];
            a = t;
            GC.Collect();
        }
    }

    static volatile ulong[][] a;
    static volatile byte[] b;

    static void Work()
    {
        for (;;)
        {
            var t = a;
            if (t == null)
                continue;
            a = null;

            for (int i = 0; i < t.Length; i++)
            {
                b = t[i].ReinterpretWithoutPreserving();
            }
        }
    }
}

@HelloKitty
Copy link
Owner

Hmmm, that's very interesting. Though it sort of violates the suggestion of the library not to hold references to the original. t as ulong[][] still references them after being reinterpreted and I suppose a still references them before a is GC'd.

Really this library's dangerous functions have narrow usecases that shouldn't leave references laying around. Reinterpreting a copy of a chunk of bytes that came off the network is one of the major ones. I am able to reproduce this crash you linked in other ways but they all share the same similarity, that references are maintained to the original invalid objects and even in a weirder state than usual, a bunch of byte arrays in an array of longs, how odd!

Until I detect a crash involving a reinterpretation of an array that isn't referenced by anything and isn't touched after reinterpreting I will leave the feature in. It's dangerous, but marked as so, and any section of code that uses it should be rigorously tested. Reliability is important but if you can achieve reliability and performance by avoiding the situation you propose above then you can take advantage of the performance.

I think I will be leaving this feature in for those usecases. If people want to do something crazy like what you posted above then they will encounter issues. If a case is found where it's being used correctly and it crashes I will remove the feature but I've let things run over night and never encountered an issue.

@jkotas
Copy link
Author

jkotas commented Sep 23, 2017

t as ulong[][] still references them after being reinterpreted

These dangling references do not matter.

Until I detect a crash involving a reinterpretation of an array that isn't referenced by anything and isn't touched after reinterpreting

Here you go:

using System;
using System.Threading;
using Reinterpret.Net;

class Program
{
    static void Main(string[] args)
    {
        new Thread(Work).Start();

        for(;;)
        {
            var t = new ulong[1000000][];
            for (int i = 0; i < t.Length; i++)
                t[i] = new ulong[1];
            a = t;
            GC.Collect();
        }
    }

    static volatile ulong[][] a;
    static volatile byte[] b;

    static void Work()
    {
        for (;;)
        {
            var t = a;
            if (t == null)
                continue;
            a = null;

            for (int i = 0; i < t.Length; i++)
            {
                var p = t[i];
                t[i] = null;
                // At this point, p local variable is the only place that is holding reference 
                // to the array that ReinterpretWithoutPreserving is being called on
                b = p.ReinterpretWithoutPreserving();
            }
        }
    }
}

@HelloKitty
Copy link
Owner

HelloKitty commented Sep 23, 2017

Darn you're right. This is very disappointing =( . Do you have any ideas on how to achieve similar performance in a way that doesn't crash the runtime? There were afew other propositions but I wasn't familiar with the safety or validity of them. The one below I thought was potentially even unsafer since I thought the object could move before you were able to finish doing what you were doing. It was proposed as a non-IL way to achieve the same thing on a reddit post. I assume it suffers from at least the same stability issues.

[StructLayout(LayoutKind.Sequential, Pack = 1, Size = 8)]
   public struct ArrayHeader {
     public uint Type;
     public uint Length;
   }

   static void Main(string[] args) {
     var bytes = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7 };
     var result = ReinterpretArray<byte, int>(bytes);
     Console.WriteLine($"{result[0]:X} {result[1]:X}");
   }

   static TOut[] ReinterpretArray<TIn, TOut>(TIn[] input) 
       where TIn : struct
       where TOut : struct {
     var dummy = new TOut[1];
     var pDummyArrayHeader = (ArrayHeader*)GetAddr(ref dummy[0]) - 1;
     var pInputArrayHeader = (ArrayHeader*)GetAddr(ref input[0]) - 1;
     Console.WriteLine(pDummyArrayHeader->Type + " " + pDummyArrayHeader->Length);
     Console.WriteLine(pInputArrayHeader->Type + " " + pInputArrayHeader->Length);
     pInputArrayHeader->Type = pDummyArrayHeader->Type;
     var sizeofTIn = (ulong)Marshal.SizeOf(default(TIn));
     var sizeofTOut = (ulong)Marshal.SizeOf(default(TOut));
     pInputArrayHeader->Length = (uint)(pInputArrayHeader->Length * sizeofTIn / sizeofTOut);
     return (TOut[])(object)input;
   }

   [MethodImpl(MethodImplOptions.AggressiveInlining)]
   public static IntPtr GetAddr<T>(ref T t) {
     TypedReference reference = __makeref(t);
     return *(IntPtr*)(&reference);
   }

This is very disappointing as this dramatically reduces the usefulness of the library. I really appreciate your help though. I didn't realize that it would crash in even those cases. I'll go ahead and remove that feature now.

HelloKitty added a commit that referenced this issue Sep 23, 2017
This was causing unexpected crashes as described in Issue #1 . Even when
I had expected it to function correctly and be "safe" it was not.
Therefore the feature has been removed.
@jkotas
Copy link
Author

jkotas commented Sep 23, 2017

Do you have any ideas on how to achieve similar performance in a way that doesn't crash the runtime?

The new Span<T>, or use unsafe code (pin the array and cast the unsafe pointers).

I assume it suffers from at least the same stability issues.

Yes, all these hacks have the same flaw.

I'll go ahead and remove that feature now.

Thank you!

@arekbal
Copy link

arekbal commented Apr 23, 2018

What about GCHandle.Alloc and pinning the array this way?
You still would have to release the pinned array at some point...
Some sort of IDisposable and finalization would be required there.

Such an API would look like
void MyMethod() {
var arr = new byte[3200];
using(var pinnedAndInterpreted = arr.ReinterpretArray<int>())
{ // that is scope where it makes sense...
// you could move disposable further up as class member and dispose with parent.
}
}
Alternatively you can pin/unpin it in some sort of lambda scope such as Shielded does.
arr.ReinterpretArray<int>(arrOfInts => { /* this array is unreliable outside of this scope */ })

Reminder: All of this is not as fast as one would hope for, with this tricks.

@jkotas
Copy link
Author

jkotas commented Apr 23, 2018

What about GCHandle.Alloc and pinning the array this way?

This does not work either. It is still going to cause intermittent crashes in the GC. The idea behind this hack is fundamentally flawed. There is no way to make it work reliably.

@HelloKitty
Copy link
Owner

HelloKitty commented Apr 23, 2018

@arekbal Depending on your needs Span<T>'s pointer ctor may provide a solution for the need to "reinterpret" an array. Though until more APIs are created to consume them without having to copy them to managed arrays it won't be perfect the use is limited.

Span looks like the true future for interacting with memory and covers the usecase this library was originally developed for. Though grabbing a fixed void* from an array and using Span to reinterpret it, if I understand correctly, will only work down a call stack. You will not be able to return the Span since the pointer would no longer be fixed.

So for now reinterpreting each element you need, either with this library or Unsafe from Microsoft, or using Span is the only solution.

@arekbal
Copy link

arekbal commented Apr 24, 2018

I am just playing(learning) with the idea of GC free code... nothing particular or special about it.
I played with Span<>, ref returns, ref structs and stackalloc couple of days ago already and that is how I got into this thread.
@jkotas I thought that the whole point of pinning is to prevent GC from moving it around.
I don't get it... could you elaborate a little what would go wrong if I were to

  1. Create a local array.
  2. Pin it with GCHandle.Alloc
  3. Rely on it not being moved or changed by GC at any point until I free the handle.

I totally get the point that I still might need to hold a reference to original or that the structures used has to be simple structures. If they are argumetns then I need to take even more care, I understand the alignment issues... but what else could go wrong? I might learn something new with your explanation. Thanks in advance.

@jkotas
Copy link
Author

jkotas commented Apr 24, 2018

  • GC requires the heap to be self-consistent. Among other things, this is used for heap walking (going from one object to the next one). Patching the object type breaks the heap walkability (temporarily or permanently depending on number of factors).

  • GC has other side structures that have to be in sync with what is in the heap memory. Patching the object type may result into these side structures getting out of sync with the heap main memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants