[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N))#509
Conversation
Replace the doc.GetValues(fieldName) call inside the field-iteration loop with direct field.GetStringValue() access. Before: for each unique field name encountered, doc.GetValues(fieldName) was called. GetValues re-scans the entire field list looking for matching names, making the total work O(unique_fields × total_fields) — O(N²) in the worst case where all field names are unique. After: collect string values in a single forward pass over doc.Fields. Each field is visited exactly once; TryGetValue adds it to the result dictionary on first encounter and appends on subsequent encounters. Total work is O(N). Also pre-sizes each List<string> with capacity 1 (the common case of a single value per field) to avoid internal array growth for most fields. Proxy metric: memory allocations + CPU instructions per search result. For a document with N unique stored fields, the change eliminates N−1 redundant GetValues passes (each O(N)), and removes N ToList() calls on the intermediate string arrays returned by GetValues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Greptile SummaryThis PR replaces the
Confidence Score: 5/5Safe to merge — the single-pass rewrite is semantically equivalent to the old code for all field types, and the new test explicitly covers the binary stored-field edge case. Both old and new code produce identical resultVals dictionaries: string fields map to their collected string values, and binary/numeric fields map to empty lists because the key is inserted unconditionally before the GetStringValue() null check. The new test locks this in. No observable behaviour changes for callers of AllValues, Values, or GetValues. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
B -- hit --> D[reuse existing list]
C --> E[strVal = field.GetStringValue]
D --> E
E -- strVal != null --> F[list.Add strVal]
E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
F --> H{more fields?}
G --> H
H -- yes --> A
H -- no --> I[return resultVals]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
B -- hit --> D[reuse existing list]
C --> E[strVal = field.GetStringValue]
D --> E
E -- strVal != null --> F[list.Add strVal]
E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
F --> H{more fields?}
G --> H
H -- yes --> A
H -- no --> I[return resultVals]
Reviews (3): Last reviewed commit: "Merge origin/support/3.x and resolve Luc..." | Re-trigger Greptile |
|
@copilot Check out Greptile's comments, assess if fixes are required and resolve the comments |
Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>
I reviewed Greptile’s feedback and applied a fix in |
|
@copilot resolve the merge conflicts in this pull request |
Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>
Resolved by merging |
🤖 This is an automated draft PR from Efficiency Improver, an AI assistant focused on reducing the energy consumption and computational footprint of this repository.
Goal
Replace the per-field
doc.GetValues(fieldName)call insideCreateSearchResultwith a single forward pass that collects string values directly from each field — reducing the work from O(N2) to O(N) per search result.Focus Area
Code-Level Efficiency — eliminate provably redundant CPU work on the search result materialisation hot path.
Approach
The problem
CreateSearchResultiteratesdoc.Fieldsand, for each unique field name encountered, callsdoc.GetValues(fieldName):Document.GetValues(name)does a full linear scan over all stored fields to collect values for that name. With U unique field names and T total fields:ContainsKeyper iterationGetValuesper unique nameToList()per unique namestring[]→List<string>In the worst case (all field names unique) this is O(N2).
The fix
Collect string values in a single forward pass — visit each field exactly once:
field.GetStringValue()is exactly whatdoc.GetValues()calls internally — no semantic change. Binary and numeric stored fields returnnullfromGetStringValue()and were already excluded byGetValues; thecontinuepreserves that behaviour.Energy Efficiency Evidence
Proxy metric: CPU instructions + heap allocations per search result (both map directly to energy draw).
GetValuesinner scansToList()callsstring[]+List<string>)List<string>initial capacityContainsKey+ indexer)TryGetValue)For a typical CMS document with 10 unique stored fields (U=10, T=10):
GetValuescalls × 10 comparisons each = 100 redundant comparisons + 20 extra heap objects (10 intermediatestring[]+ 10List<string>copies)GetStringValue()calls = 0 redundant comparisons, 10List<string>objects onlyAs field count grows (e.g. rich content with 30+ stored fields), the saving scales quadratically with the old code.
Green Software Foundation Context
Hardware Efficiency: eliminates provably unnecessary CPU work on every search result. The per-result saving is modest in isolation, but
CreateSearchResultis called once per result in every search execution — a direct multiplier on search throughput and energy per query.Software Carbon Intensity (SCI): reducing CPU time per functional unit (one search query) lowers the energy component of SCI.
Trade-offs
None. The change is semantically equivalent. Readability is preserved or improved — the single-pass loop is straightforward and the comment explains the rationale.
Reproducibility
dotnet build src/Examine.sln --configuration Release dotnet test src/Examine.Test/Examine.Test.csproj -f net8.0Test Status
✅ Build clean (0 errors, 3 pre-existing net6.0 EOL warnings). 148 tests passed (2 skipped as expected).
Add this agentic workflows to your repo
To install this agentic workflow, run