Skip to content

Conversation

@tannergooding
Copy link
Member

One of my projects was throwing a NullReferenceException in .NET 6 RC2:

All projects are up-to-date for restore.
TerraFX.Interop.Windows -> C:\Users\tagoo\Source\repos\terrafx.interop.windows\artifacts\bin\sources\TerraFX.Interop.Windows\Release\net6.0\TerraFX.Interop.Windows.dll
TerraFX.Samples.DirectX -> C:\Users\tagoo\Source\repos\terrafx.interop.windows\artifacts\bin\samples\TerraFX.Samples.DirectX\Release\net6.0\win-x64\TerraFX.Samples.DirectX.dll
ILLink : error IL1012: IL Linker has encountered an unexpected error. Please report the issue at https://github.com/mono/linker/issues [C:\Users\tagoo\Source\repos\terrafx.interop.windows\samples\DirectX\TerraFX.Samples.DirectX.csproj]
Fatal error in IL Linker
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at Mono.Linker.TypeMapInfo.TypeMatch(TypeReference a, TypeReference b)
at Mono.Linker.TypeMapInfo.MethodMatch(MethodReference candidate, MethodReference method)
at Mono.Linker.TypeMapInfo.TryMatchMethod(TypeReference type, MethodReference method)
at Mono.Linker.TypeMapInfo.MapInterfaceMethodsInTypeHierarchy(TypeDefinition type)
at Mono.Linker.TypeMapInfo.MapType(TypeDefinition type)
at Mono.Linker.TypeMapInfo.EnsureProcessed(AssemblyDefinition assembly)
at Mono.Linker.TypeMapInfo.GetBaseMethods(MethodDefinition method)
at Mono.Linker.Steps.MarkStep.IsVirtualNeededByInstantiatedTypeDueToPreservedScope(MethodDefinition method)
at Mono.Linker.Steps.MarkStep.GetRequiredMethodsForInstantiatedType(TypeDefinition type)+MoveNext()
at Mono.Linker.Steps.MarkStep.MarkRequirementsForInstantiatedTypes(TypeDefinition type)
at Mono.Linker.Steps.MarkStep.MarkType(TypeReference reference, DependencyInfo reason, Nullable`1 origin)
at Mono.Linker.Steps.MarkStep.MarkField(FieldDefinition field, DependencyInfo& reason)
at Mono.Linker.Steps.MarkStep.MarkEntireType(TypeDefinition type, DependencyInfo& reason)
at Mono.Linker.Steps.MarkStep.MarkEntireAssembly(AssemblyDefinition assembly)
at Mono.Linker.Steps.MarkStep.MarkAssembly(AssemblyDefinition assembly, DependencyInfo reason)
at Mono.Linker.Steps.MarkStep.ProcessMarkedPending()
at Mono.Linker.Steps.MarkStep.Initialize()
at Mono.Linker.Steps.MarkStep.Process(LinkContext context)
at Mono.Linker.Pipeline.ProcessStep(LinkContext context, IStep step)
at Mono.Linker.Pipeline.Process(LinkContext context)
at Mono.Linker.Driver.Run(ILogger customLogger)
at Mono.Linker.Driver.Main(String[] args)
Optimizing assemblies for size, which may change the behavior of the app. Be sure to test after publishing. See: https://aka.ms/dotnet-illink
C:\Program Files\dotnet\sdk\6.0.100-rc.2.21505.57\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.ILLink.targets(108,5): error NETSDK1144: Optimizing assemblies for size failed. Optimization can be disabled by setting the PublishTrimmed property to false. [C:\Users\tagoo\Source\repos\terrafx.interop.windows\samples\DirectX\TerraFX.Samples.DirectX.csproj]

This was ultimately because the types that were being matched were FunctionPointerType for which .ElementType returns null. This updates the TypeMatch to account for FunctionPointerType and do a comparison of calling convention, return type, and parameter types.

return true;
}

static bool TypeMatch (FunctionPointerType a, FunctionPointerType b)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% this is covering everything, but it at least appears to work correctly with the function pointers I have declared.

Copy link
Member Author

@tannergooding tannergooding Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most notably, function pointers can't be generic but they can have generic use generic types for the return type or parameter if the parent scope is generic

There isn't a context available here, so I'm unsure if there needs to be any additional handling or logic to cover that scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also modreq that can appear as part of the calling convention. I think modreq are handled further up in the TypeMatch logic, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the generic return/parameter types are covered by the GenericInstanceType case in TypeMatch (TypeSpecification, TypeSpecification).

if (a is GenericParameter genericParameterA && b is GenericParameter genericParameterB)
return TypeMatch (genericParameterA, genericParameterB);

if ((a is null) || (b is null))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is protecting against the crashing case where either input is null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - maybe it would be better to let it crash - at least that way we find out quickly. This way it may just not work correctly - which would be MUCH harder to figure out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add a diagnostic/warning if it hits this path?

I think a crash is bad/worst case scenario, particularly in a shipping product. Having the ability to have a reasonable fallback is better, but hinders the ability to catch these scenarios as you indicated. Surfacing a warning is the next best thing so users can still see the issue without being blocked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings need to have error codes so that they're suppressible for people who WarnAsError + they need localizable messages. I'm not sure introducing warnings for internal errors is worth the cost of documenting and maintaining them.

There's also people who ignore warnings, come with broken app asking for support without mentioning the warning, and we're left reading from tea leaves (my experience from years in .NET Native support).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own view here, as a consumer of the app, is that having absolutely no path forward because I happened to hit an unhandled path in the app is an even worse experience experience. My two options here were:
A. "don't use trimming"
B. "figure out what broke the trimming and don't do that"

I didn't get the option to live with additional bloat in my trimmed app and I didn't get any diagnostic information that I could provide in an issue. Instead, I had a stack trace and happened to be working on an open source project that would have potentially made it simple to reproduce the error. This isn't going to be the case most of the time and most users aren't going to have the interest to even debug and try to root cause the issue in the first place (let alone consider logging an issue or being aware of where to log the issue).

We're advertising trimming even more for .NET 6 and have made massive leaps and bounds in reporting useful information to the user. Having a warning that can provide additional information and having a workaround that can be used, even with a slightly degraded experience is, to me, a significantly better setup and is effectively the minimum I'd come to expect from our .NET tooling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So roughly the same experience as the rest of the tools we ship in the .NET SDK? This isn't something unique to the linker.

Right, but illinker already has a devloop experience around that that is a lot more efficient for the dev team, so the illinker team would be giving up that. It's harder to give up something one has than something one never had, especially if it's this easy to quantify the cost of not having it (just see how easy it was to troubleshoot the 5 bug reports - even the initial one was easy because of the stack trace). I'm not really on the illink dev team officially, so it's not my call to make but the advantage is obvious. If you filed the thing you wrote on Teams that prompted this fix as a drive-by issue and never responded to any questions after, we would probably still be able to come up with a repro case and a fix.

Why not just assume that thing can't be trimmed if it can't be understood.

It might not even be possible to reconstruct the appropriate metadata with Cecil without having some knowledge about it. We (as the runtime team) have been doing breaking changes to the metadata format without revving the version field with the assumption that it's "close enough and tools might handle it", but none of that was ever tested with the tools. Changes to the file formats need to be understood by illinker. Not even that - changes to runtime need to be understood by illinker. E.g. IDynamicIntefaceCastable needed linker changes not to strip default implementations of things that (absent IDynamicInterfaceCastable) provably weren't used. The newly added CreateSpan API will need illinker changes so that the underlying RVA static fields are properly aligned when writing the trimmed output. S.R.Metadata is not able to encode byref to a byref and will need a public API change to emit it; who knows what Cecil does with that. Etc. I don't see much value in trying to futureproof for something that likely will need special casing anyway at the risk of the futureproofing code kicking in for straight up bugs.

Copy link
Member Author

@tannergooding tannergooding Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not even be possible to reconstruct the appropriate metadata with Cecil without having some knowledge about it.

Perhaps there needs to be a way for consumers to be able to have a faster inner loop around the illinker tooling and getting a fixed version? Right now, I can't even override the inbox linker with a private copy that has the fix, so my repo is "dead in the water", as it were, until the next SDK servicing release.

If I had caught this just after 6.0 shipped or just after the last servicing release, I'd have to wait around a month for the next fix and that's my main concern here.

  • That is, my main concern is I found a bug and now I have no real way to workaround the issue so my app is "regressed" in terms of size and startup perf. Its not a huge deal right now, but it very well could be at another time or for another person using the tooling.

One example, of what I mean by being able to use a private copy, is that Roslyn for example defaults to the inbox version, but I can trivially reference https://www.nuget.org/packages/Microsoft.Net.Compilers/4.0.0-6.final for example and get a newer compiler version or even a nightly build for prototyping; which unblocks these kinds of issues, even temporarily, until the proper fix becomes available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I can't even override the inbox linker with a private copy

You can set a property ILLinkTasksAssembly to point to a private build of illink (and the related task) on your build/publish and it will use that. We don't have a convenient NuGet to do this for you though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to add a packageref to Microsoft.NET.ILLink.Tasks to override the inbox version. We don't ship this package on nuget.org (it's on the dotnet feeds), and we don't advertise or test this scenario, but it's set up to work this way.

This logic doesn't look like it covers the case where:

  • one is a non-null TypeSpecification, and the other is null (I think this would give a nullref exception on GetType).

In any case I agree we should let it fail with an exception since that's the approach we have been taking in this codebase.

@tannergooding my read of your feedback is that we should avoid blocking our customers if at all possible in case of a linker crash - whether that means:

  1. shipping a fix OOB
  2. providing enough diagnostics to let customers understand the issue and work around it in their code
  3. an option to tell the linker to do a best-effort recovery attempt (accepting the risk that it might generate bad code), possibly with extra diagnostics

We have the option to do 1, and the stacktrace is close to the best we can do for 2 given that linker crashes are unknown unknowns until we uncover them. I understand the frustration of hitting a linker crash and being blocked by it - do you still think something like 3 would be valuable (with a big disclaimer of course)? If so I'd appreciate if you could file an issue to track the suggestion. It would probably impact more of the linker than just the TypeMatch logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the changes I added here and kept this just as checking for and handling FunctionPointerType

@marek-safar
Copy link
Contributor

Do you have any test to show that the current logic is not good enough?

@tannergooding
Copy link
Member Author

@marek-safar, I pushed up two tests. CanCompileInterfaceWithFunctionPointerParameter fails with a NullReferenceException without this change.

@tannergooding
Copy link
Member Author

-- Given this results in a crash on .NET 6, should this be considered for servicing as well?

@tannergooding tannergooding changed the base branch from main to release/6.0.2xx November 4, 2021 16:15
@tannergooding
Copy link
Member Author

Retargeted to release/6.0.2xx as requested by @vitek-karas offline.

@rickbrew
Copy link

rickbrew commented Nov 4, 2021

This crash will affect Paint.NET's use of illink once TerraFX.Interop.Windows starts using some of the new code that Tanner is putting in there, so I much appreciate this being pulled in for 6.0 servicing :)

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please run .\lint.cmd (or ./lint.sh) from the repo root? It will reformat the code as appropriate.

if (a is GenericParameter genericParameterA && b is GenericParameter genericParameterB)
return TypeMatch (genericParameterA, genericParameterB);

if ((a is null) || (b is null))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - maybe it would be better to let it crash - at least that way we find out quickly. This way it may just not work correctly - which would be MUCH harder to figure out.

Copy link
Member

@sbomer sbomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution!

return true;
}

static bool TypeMatch (FunctionPointerType a, FunctionPointerType b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the generic return/parameter types are covered by the GenericInstanceType case in TypeMatch (TypeSpecification, TypeSpecification).

if (a is GenericParameter genericParameterA && b is GenericParameter genericParameterB)
return TypeMatch (genericParameterA, genericParameterB);

if ((a is null) || (b is null))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to add a packageref to Microsoft.NET.ILLink.Tasks to override the inbox version. We don't ship this package on nuget.org (it's on the dotnet feeds), and we don't advertise or test this scenario, but it's set up to work this way.

This logic doesn't look like it covers the case where:

  • one is a non-null TypeSpecification, and the other is null (I think this would give a nullref exception on GetType).

In any case I agree we should let it fail with an exception since that's the approach we have been taking in this codebase.

@tannergooding my read of your feedback is that we should avoid blocking our customers if at all possible in case of a linker crash - whether that means:

  1. shipping a fix OOB
  2. providing enough diagnostics to let customers understand the issue and work around it in their code
  3. an option to tell the linker to do a best-effort recovery attempt (accepting the risk that it might generate bad code), possibly with extra diagnostics

We have the option to do 1, and the stacktrace is close to the best we can do for 2 given that linker crashes are unknown unknowns until we uncover them. I understand the frustration of hitting a linker crash and being blocked by it - do you still think something like 3 would be valuable (with a big disclaimer of course)? If so I'd appreciate if you could file an issue to track the suggestion. It would probably impact more of the linker than just the TypeMatch logic.

@marek-safar
Copy link
Contributor

Retargeted to release/6.0.2xx as requested

This should go to main first and only then be backported to relevant branches (note 6.0.2 is likely months away)

@tannergooding tannergooding changed the base branch from release/6.0.2xx to main November 15, 2021 21:55
@tannergooding
Copy link
Member Author

@marek-safar, retargeted.

Please let me know if anything else is needed here.

@marek-safar marek-safar merged commit fc41993 into dotnet:main Nov 16, 2021
agocke pushed a commit to dotnet/runtime that referenced this pull request Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants