-
Notifications
You must be signed in to change notification settings - Fork 5.3k
PGO: Add new tiers #70941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO: Add new tiers #70941
Changes from 49 commits
07de069
a7bbfaf
4b4967b
1392368
1a41355
4828f3e
1fa264b
71928f3
127b7d8
2cee8a9
911af8a
0316842
2022e68
246dd95
3a718af
41733fe
21c8f6e
f37012d
db9c71d
59b2bc9
ca5c347
a6318a0
f75e289
e9e12ea
81496c4
be329a1
48e375e
1ac3e19
f6b457a
7b9e5d4
cbddf5b
9345815
86523e7
6a0c05c
a073ed7
75da822
e83eb68
b1f4d2e
acdce3d
27f7228
2e478c0
557ec0c
48cf945
ca491a0
25bddf0
f777274
13e1211
15c0e2d
3d88252
60a77e5
df19c68
f46679c
4d77f85
c9bd079
d3f205b
ef4ae58
7fd0749
c4395f8
7fbcf17
a183996
ff471e1
b202b9f
64eb6df
07339dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| # Instrumented Tiers | ||
|
|
||
| _Disclaimer: the functionality described in this doc is still in the preview stage and is not enabled by default even for `DOTNET_TieredPGO=1`._ | ||
|
|
||
| [#70941](https://github.com/dotnet/runtime/pull/70941) introduced new opt-in strategies for Tiered Compilation + TieredPGO mainly to address | ||
| two existing limitations of the current design: | ||
| 1) R2R code never benefits from Dynamic PGO as it's not instrumented and is promoted straight to Tier1 when it's hot | ||
| 2) Instrumentation in Tier0 comes with a big overhead and it's better to only instrument hot Tier0 code (whether it's ILOnly or R2R) | ||
|
|
||
| A good example explaining boths problems is this TechEmpower benchmark (plaintext-plaintext): | ||
|
|
||
|  | ||
|
|
||
| Legend: | ||
| * Red - `DOTNET_TieredPGO=0`, `DOTNET_ReadyToRun=1` (default) | ||
| * Black - `DOTNET_TieredPGO=1`, `DOTNET_ReadyToRun=1` | ||
| * Yellow - `DOTNET_TieredPGO=1`, `DOTNET_ReadyToRun=0` | ||
|
|
||
| Yellow line provides the highest level of performance (RPS) by sacrificing start up speed (and, hence, time it takes to process the first request). It happens because the benchmark is quite simple and most of its code is already prejitted so we can only instrument it when we completely drop R2R and compile everything from scratch. It also explains why the black line (when we enable Dynamic PGO but still rely on R2R) didn't really show a lot of improvements. With the separate instrumentation tier for hot R2R we achieve "Yellow"-level of performance while maintaining the same start up speed as it was before. Also, for the mode where we have to compile a lot of code to Tier0, switching to "instrument only hot Tier0 code" strategy shows ~8% time-to-first-request reduction across all TE benchmarks. | ||
|
|
||
|  | ||
| (_predicted results according to local runs of crank with custom binaries_) | ||
|
|
||
| # Tiered compilation workflow in TieredPGO mode | ||
|
|
||
| The following diagram explains how the instrumentation for hot R2R code works under the hood when TieredPGO is enabled (it's disabled by default): | ||
|
|
||
| ```mermaid | ||
| flowchart | ||
| prestub(.NET Function) -->|Compilation| hasAO{"Marked with<br/>[AggressiveOpts]?"} | ||
| hasAO-->|Yes|tier1ao["JIT to <b><ins>Tier1</ins></b><br/><br/>(that attribute is extremely<br/> rarely a good idea)"] | ||
| hasAO-->|No|hasR2R | ||
| hasR2R{"Is prejitted (R2R)<br/>and ReadyToRun==1"?} -->|No| istrTier0Q | ||
|
|
||
| istrTier0Q{"<b>TieredPGO_Strategy:</b><br/>Instrument only<br/>hot Tier0 code?"} | ||
| istrTier0Q-->|No, always instrument tier0|tier0 | ||
| istrTier0Q-->|Yes, only hot|tier000 | ||
| tier000["JIT to <b><ins>Tier0</ins></b><br/><br/>(not optimized, not instrumented,<br/> with patchpoints)"]-->|Running...|ishot555 | ||
| ishot555{"Is hot?<br/>(called >30 times)"} | ||
| ishot555-.->|No,<br/>keep running...|ishot555 | ||
| ishot555-->|Yes|tier0 | ||
|
|
||
| hasR2R -->|Yes| R2R | ||
| R2R["Use <b><ins>R2R</ins></b> code<br/><br/>(optimized, not instrumented,<br/>with patchpoints)"] -->|Running...|ishot1 | ||
| ishot1{"Is hot?<br/>(called >30 times)"}-.->|No,<br/>keep running...|ishot1 | ||
| ishot1--->|"Yes"|instrumentR2R | ||
|
|
||
| instrumentR2R{"<b>TieredPGO_Strategy:</b><br/>Instrument hot<br/>R2R'd code?"} | ||
| instrumentR2R-->|Yes, instrument R2R'd code|istier1inst | ||
| instrumentR2R-->|No, don't instrument R2R'd code|tier1nopgo["JIT to <b><ins>Tier1</ins></b><br/><br/>(no dynamic profile data)"] | ||
|
|
||
| tier0["JIT to <b><ins>InstrumentedTier</ins></b><br/><br/>(not optimized, instrumented,<br/> with patchpoints)"]-->|Running...|ishot5 | ||
| tier1pgo2["JIT to <b><ins>Tier1</ins></b><br/><br/>(optimized with profile data)"] | ||
| tier1pgo2_1["JIT to <b><ins>Tier1</ins></b><br/><br/>(optimized with profile data)"] | ||
|
|
||
| istier1inst{"<b>TieredPGO_Strategy:</b><br/>Enable optimizations<br/>for InstrumentedTier?"}-->|"No"|tier0_1 | ||
| istier1inst--->|"Yes"|tier1inst["JIT to <b><ins>InstrumentedTierOptimized</ins></b><br/><br/>(optimized, instrumented, <br/>with patchpoints)"] | ||
| tier1inst-->|Running...|ishot5_1 | ||
| ishot5{"Is hot?<br/>(called >30 times)"}-->|Yes|tier1pgo2 | ||
| ishot5-.->|No,<br/>keep running...|ishot5 | ||
|
|
||
|
|
||
| ishot5_1{"Is hot?<br/>(called >30 times)"} | ||
| ishot5_1-.->|No,<br/>keep running...|ishot5_1 | ||
| ishot5_1{"Is hot?<br/>(called >30 times)"}-->|Yes|tier1pgo2_1 | ||
|
|
||
| tier0_1["JIT to <b><ins>InstrumentedTier</ins></b><br/><br/>(not optimized, instrumented,<br/> with patchpoints)"] | ||
| tier0_1-->|Running...|ishot5_1 | ||
| ``` | ||
| (_VSCode doesn't support mermaid diagrams, consider installing external add-ins_) | ||
|
|
||
| ## Pros & cons of using optimizations inside the instrumented tiers | ||
|
|
||
| Pros: | ||
| * Lower overhead from instrumentation (and thanks to optimizations we _can_ optimize probes and emit less of those) | ||
| * Optimized code is able to inline methods so we won't be producing new Compilation units for even small methods | ||
|
|
||
| Cons: | ||
| * Currently, we won't instrument inlinees -> we'll probably miss a lot of opportunities and produce less accurate profile leading to a less optimized final tier |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -383,7 +383,7 @@ void BlockCountInstrumentor::Prepare(bool preImport) | |
| // | ||
| // If we see any, we need to adjust our instrumentation pattern. | ||
| // | ||
| if (m_comp->opts.IsOSR() && ((m_comp->optMethodFlags & OMF_HAS_TAILCALL_SUCCESSOR) != 0)) | ||
| if (m_comp->opts.IsInstrumentedOptimized() && ((m_comp->optMethodFlags & OMF_HAS_TAILCALL_SUCCESSOR) != 0)) | ||
| { | ||
| JITDUMP("OSR + PGO + potential tail call --- preparing to relocate block probes\n"); | ||
|
|
||
|
|
@@ -1887,8 +1887,8 @@ PhaseStatus Compiler::fgPrepareToInstrumentMethod() | |
| (JitConfig.TC_PartialCompilation() > 0); | ||
| const bool prejit = opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT); | ||
| const bool tier0WithPatchpoints = opts.jitFlags->IsSet(JitFlags::JIT_FLAG_TIER0) && mayHavePatchpoints; | ||
| const bool osrMethod = opts.IsOSR(); | ||
| const bool useEdgeProfiles = (JitConfig.JitEdgeProfiling() > 0) && !prejit && !tier0WithPatchpoints && !osrMethod; | ||
| const bool instrOpt = opts.IsInstrumentedOptimized(); | ||
| const bool useEdgeProfiles = (JitConfig.JitEdgeProfiling() > 0) && !prejit && !tier0WithPatchpoints && !instrOpt; | ||
|
||
|
|
||
| if (useEdgeProfiles) | ||
| { | ||
|
|
@@ -1899,7 +1899,7 @@ PhaseStatus Compiler::fgPrepareToInstrumentMethod() | |
| JITDUMP("Using block profiling, because %s\n", | ||
| (JitConfig.JitEdgeProfiling() == 0) | ||
| ? "edge profiles disabled" | ||
| : prejit ? "prejitting" : osrMethod ? "OSR" : "tier0 with patchpoints"); | ||
| : prejit ? "prejitting" : instrOpt ? "optimized instr" : "tier0 with patchpoints"); | ||
|
|
||
| fgCountInstrumentor = new (this, CMK_Pgo) BlockCountInstrumentor(this); | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9602,7 +9602,16 @@ var_types Compiler::impImportCall(OPCODE opcode, | |
| { | ||
| return impImportJitTestLabelMark(sig->numArgs); | ||
| } | ||
| #endif // DEBUG | ||
|
|
||
| // static ulong JitHelpers_JitFlags() => 0; | ||
| // can be defined anywhere and will be replaced by Debug-version of RyuJIT | ||
| if ((mflags & CORINFO_FLG_STATIC) && (sig->numArgs == 0) && (sig->retType == CorInfoType::CORINFO_TYPE_ULONG) && | ||
| (strcmp("JitHelpers_JitFlags", eeGetMethodName(methHnd, nullptr)) == 0)) | ||
| { | ||
| call = gtNewLconNode((__int64)opts.jitFlags->GetRawFlags()); | ||
| goto DONE_CALL; | ||
| } | ||
| #endif | ||
|
||
|
|
||
| // <NICE> Factor this into getCallInfo </NICE> | ||
| bool isSpecialIntrinsic = false; | ||
|
|
@@ -22224,7 +22233,7 @@ bool Compiler::impConsiderCallProbe(GenTreeCall* call, IL_OFFSET ilOffset) | |
| return false; | ||
| } | ||
|
|
||
| assert(opts.OptimizationDisabled() || opts.IsOSR()); | ||
| assert(opts.OptimizationDisabled() || opts.IsInstrumentedOptimized()); | ||
| assert(!compIsForInlining()); | ||
|
|
||
| // During importation, optionally flag this block as one that | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.