Skip to content

Conversation

@jkoritzinsky
Copy link
Member

@jkoritzinsky jkoritzinsky commented Sep 17, 2022

Today, we generate bodies for all methods in crossgen2. This results in wasted space in libraries formethods that are always inlined. For System.Private.CoreLib on Windows x64 (release build), this amounts to 4653 methods totaling 298kb of size-on-disk space that will never be used.

This PR changes the rooting rules for crossgen2 as follows:

  • If a method is exposed outside the assembly as public, protected in a public type, or internal in an assembly with an InternalsVisibleToAttribute applied, it will be rooted.
  • If a method implements an interface method or overrides a virtual method that meets the criteria above, it will also be rooted.
  • If a method is explicitly mentioned in an embedded linker descriptor file, it will be rooted as well. This option ensures that methods that the runtime calls in CoreLib are pre-compiled in addition to being preserved by the linker. This may also make other methods continue to be precompiled that shouldn't be, but I think the number of methods in that case aren't particularly high, especially given that we were precompiling everything beforehand.

I've attached a diff from R2R dump so it's easy to see the change in CoreLib.

Original Description Today, we generate bodies for all P/Invokes that have blittable IL stubs in crossgen2. This results in wasted space in libraries for P/Invokes that are always inlined. For System.Private.CoreLib on Windows x64 (release build), this amounts to 374 methods totaling 40kb of size-on-disk space that will never be used (we only pre-generate 10 P/Invoke stubs after this change).

This PR utilizes the "recompile this method" feature to delay compilation of any P/Invoke's stubs until the P/Invoke is actually used in a way that causes it to not be inlined.

With this change, there is no change in the number of IL stubs emitted in a Hello World app on Windows x64.

Open Questions:

  • Does the JIT notify the EE across the interface when inlining fails due to the block being cold, the JIT being told to optimize for size, or the JIT being told to generate debuggable code?
  • Should we still emit code for any public (or internal w/ InternalsVisibleTo) blittable P/Invokes? In these cases, we can't guarantee that they won't be called by non R2R'd user code, so without updating this PR to account for this, these stubs will now be generated at runtime. This case doesn't apply for any of our product code as all of our P/Invokes are internal or private and we don't use InternalsVisibleTo in the dotnet/runtime repo.

I've included a diff from R2R dump so it's easy to see the change this PR causes in CoreLib.

Contributes to #69758 by reducing further the cases where we pre-generate stubs in crossgen2.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice... but I think the recompile tech is the wrong hook to use here.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 17, 2022
@jkotas
Copy link
Member

jkotas commented Sep 17, 2022

I think a better way to do this would be to compile these stubs, but do not save them into the final image if we find that they are not referenced from anywhere. It would address the problem of JIT deciding to not inline the stub in cold code, etc.

Similar optimization can be applied to all inlineable method (the same concerns about public or InternalVisible apply). The blittable PInvoke stubs are really just a special-case of inlineable methods.

@davidwrighton
Copy link
Member

@jkotas, the right way to do that would be to use the _additionalDependencies model. @jkoritzinsky, so the way to fix it is to enable the _additionalDependencies logic in the jit and MethodWithGCInfo.cs ... but we don't need the logic around GetDependenciesDueToAccess that is used to populate _additionalDependencies for NativeAOT.

@jkotas
Copy link
Member

jkotas commented Sep 19, 2022

References to PInvoke stubs are relocs. Is there a reason for not using reloc dependencies for this?

I agree that you can make it work using _additionalDependencies. I am just wondering whether reloc dependencies would be a better fit.

@MichalStrehovsky
Copy link
Member

References to PInvoke stubs are relocs. Is there a reason for not using reloc dependencies for this?

If these show up as relocations within the produced method code, it would definitely be worth investigating why it doesn't "just work" automatically (it should be requesting a method body for the p/invoke and none of the extra code should be needed to handle that).

@jkotas
Copy link
Member

jkotas commented Sep 21, 2022

it would definitely be worth investigating why it doesn't "just work" automatically

crossgen2 adds all methods as roots:

foreach (MetadataType type in _module.GetAllTypes())
{
MetadataType typeWithMethods = type;
if (type.HasInstantiation)
{
typeWithMethods = InstantiateIfPossible(type);
if (typeWithMethods == null)
continue;
}
RootMethods(typeWithMethods, "Library module method", rootProvider);
}
}
}
private void RootMethods(TypeDesc type, string reason, IRootingServiceProvider rootProvider)
{
foreach (MethodDesc method in type.GetAllMethods())
{
// Skip methods with no IL
if (method.IsAbstract)
continue;
if (method.IsInternalCall)
continue;
MethodDesc methodToRoot = method;
if (method.HasInstantiation)
{
methodToRoot = InstantiateIfPossible(method);
if (methodToRoot == null)
continue;
}
try
{
if (!CorInfoImpl.ShouldSkipCompilation(method))
{
CheckCanGenerateMethod(methodToRoot);
rootProvider.AddCompilationRoot(methodToRoot, rootMinimalDependencies: false, reason: reason);
}
}
catch (TypeSystemException)
{
// Individual methods can fail to load types referenced in their signatures.
// Skip them in library mode since they're not going to be callable.
continue;
}
}
. We would need to stop doing this for the dependency analyzer to kick in.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 22, 2022
@jkoritzinsky
Copy link
Member Author

I've validated that the behavior is identical to what I had before if I just don't root P/Invokes, so I've removed the JitInterface changes.

@jkoritzinsky
Copy link
Member Author

I did a prototype of only rooting methods that are visible outside of the assembly and found that we can save probably an additional 300kb from an additional 6k methods or so that are always inlined. This doesn't account for any rooting of the internal/private methods called by the runtime, so I'd need to port that infrastructure over from ILC (as we already support the .NET Linker rooting format there, and we already generate a file of that format as part of the CoreLib build for these exact methods) to get exact numbers.

I've attached a copy of the diff against this PR that I got in my prototype. I'm a little suspicious of some of the results, so I definitely need to do more work if we want to go in this direction (in addition to rooting method called from CoreLib).

r2r-diff.txt

@jkoritzinsky jkoritzinsky changed the title Only R2R blittable IL stubs that aren't inlined Only R2R public methods, methods that override public methods, and internal methods that aren't always inlined Sep 23, 2022
@jkoritzinsky
Copy link
Member Author

I've updated this PR to be more generic and only root methods that may be exposed externally instead of only skipping P/Invokes. Can I get another review pass?

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

Is this statement correct? "Reflection invoke on private methods will fall back to JIT unless the method is rooted in some other way (e.g. called from somewhere else)."

The diff you have shared is surprising. If I read it correctly, methods like System.Diagnostics.Tracing.RuntimeEventSource.OnEventCommand and System.Buffers.TlsOverPerCoreLockedStacksArrayPool1<__Canon>.Trim` are not compiled anymore. Why is that?

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Sep 24, 2022

Is this statement correct? "Reflection invoke on private methods will fall back to JIT unless the method is rooted in some other way (e.g. called from somewhere else)."

Yes, that statement is correct. If we want to make that statement incorrect (ie track reflection), we could import more of the linker compatibility code from ILC to assist in reflection analysis.

The diff you have shared is surprising. If I read it correctly, methods like System.Diagnostics.Tracing.RuntimeEventSource.OnEventCommand and System.Buffers.TlsOverPerCoreLockedStacksArrayPool1<__Canon>.Trim` are not compiled anymore. Why is that?

I need to figure that out. I'm seeing quite a few interesting methods in that list that all seem to match the following description:

  • All callsites of the method are simple methods whose only statement is a call to this method.

Do you have any ideas why methods in this case wouldn't get dependency nodes added? Many of the methods (like System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1<__Canon>.Trim) seem to be too large to be inlined.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

Do you have any ideas why methods in this case wouldn't get dependency nodes added?

It may a problem with tailcall optimization confusing the dependency tracker.

@jkoritzinsky
Copy link
Member Author

That would make sense. I'll see if I can trace down what's happening there.

@jkoritzinsky
Copy link
Member Author

I think the tailcalls might not use the reloc's mechanism and as a result are being missed. @davidwrighton mentioned some other dependency mechanism in ILC above that might be good for this. We may have to update the R2R side of the JIT interface after all to add these dependencies during code-generation. I'll give it a shot.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

The tailcalls targets are relocs as well. My guess is that the type of the reloc for tailcall is different from a regular call and not understood by the dependency tracker, but it is still a reloc. I do not think the extra reloc scheme is required to deal with tailcalls. Relocs should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should inline this logic to where it's needed (or into an extension method). It doesn't look like we need to cache this and metadata reader + handle is available as a public API on EcmaType.

@MichalStrehovsky
Copy link
Member

Cc @vitek-karas @tlakollo @dotnet/ilc-contrib - we now have a need to share code between IL Linker + NativeAOT + crossgen2.

@jkoritzinsky
Copy link
Member Author

The diff you have shared is surprising. If I read it correctly, methods like System.Diagnostics.Tracing.RuntimeEventSource.OnEventCommand and System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1<__Canon>.Trim are not compiled anymore. Why is that?

I've figured out that I was being too aggressive on handling overrides (I didn't account for overrides with a MethodImpl record). My most recent commit preserves all virtual methods in addition to methods with explicit MethodImpl records (so static abstracts are still handled). This is more conservative, but it ensures that we don't accidentally drop any methods we might need. We can come back and make this more specific later. The change as of 59b4a64 is still a 151kb size savings.

r2r-diff.txt

…vers this case, as long as we don't root P/Invokes. Remove all of the other work that interfaces with the JIT since it's unnecessary.
… cover all of the methods in CoreLib used by the runtime) any embedded linker xml root files.
… as it's difficult to determine if a method implements any interface method in the managed type system and we don't need to be perfect.
@jkoritzinsky jkoritzinsky force-pushed the crossgen2-no-unused-pinvoke-gen branch from db16827 to 72a85d6 Compare October 4, 2022 03:33
@MichalStrehovsky
Copy link
Member

For the "File renamed without changes." changes, can you please check for copies under ReferenceSource and move them as well? These files are shared with illinker, the ReferenceSource is the literal copy from the linker repo - we keep them up to date by copying over the ReferenceSource files and manually applying the diffs.

E.g.
tools\aot\ILCompiler.Compiler\Compiler\ReferenceSource\ProcessLinkerXmlBase.cs and
tools\aot\ILCompiler.Compiler\Compiler\ProcessLinkerXmlBase.cs

Please also make a copy of README.md that has the commit hash from the illinker repo.

I'm kind of thinking whether we should move everything that is on the sharing plan to Common, even if it's not used by crossgen2 - so that we don't have files littered in yet another directory. Cc @dotnet/ilc-contrib @vitek-karas for thoughts.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise, but I only skimmed the crossgen2-specific changes and it would be better if a crossgen dev has a look as well.

@jkoritzinsky jkoritzinsky requested a review from jkotas October 7, 2022 17:43
@jkoritzinsky
Copy link
Member Author

cc: @dotnet/crossgen-contrib can I get a second review?

@jkotas
Copy link
Member

jkotas commented Oct 8, 2022

Could you please share data about the size reductions for the netcoreapp with the latest change?

I am worried about the worst-case effects of the visibility-based rooting policy. For example, it is not unusual to see frameworks that use private reflection for binding. This will skip precompiling everything that these frameworks bind to (unless it is reachable by other means).

I think we may need explicit policy to control the visibility-based rooting policy and/or enable it only in limited fashion by default.

@jkoritzinsky
Copy link
Member Author

The size reduction for R2Rd NETCoreApp as a whole is pretty small:

Test case is uncompressed dotnet-runtime-internal zip from a Release x64 build.
66.4MB - with change
67.1MB - without change

So about 0.7MB savings for the default configuration. 151KB of that is in System.Private.CoreLib.

If you feel it's important, I can make this a configurable option.

… assembly is trimmable. It's less likely that an assembly that is trimmable will be using private reflection hackery.
@jkoritzinsky
Copy link
Member Author

@jkotas I added the opt-in based on IsTrimmable being true. Is this good enough to merge?

@jkotas
Copy link
Member

jkotas commented Nov 8, 2022

It looks good enough to me. As @MichalStrehovsky said, it would be better if a crossgen dev has a look as well.

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me and nicely ties into my current experiments regarding reduction of R2R container size, thanks Jeremy!

@jkoritzinsky jkoritzinsky merged commit 55d3027 into dotnet:main Nov 16, 2022
@jkoritzinsky jkoritzinsky deleted the crossgen2-no-unused-pinvoke-gen branch November 16, 2022 20:56
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants