Skip to content

[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N))#509

Merged
Shazwazza merged 3 commits into
support/3.xfrom
efficiency/single-pass-field-collection-d146e0c6987a8055
Jun 25, 2026
Merged

[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N))#509
Shazwazza merged 3 commits into
support/3.xfrom
efficiency/single-pass-field-collection-d146e0c6987a8055

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

🤖 This is an automated draft PR from Efficiency Improver, an AI assistant focused on reducing the energy consumption and computational footprint of this repository.

Goal

Replace the per-field doc.GetValues(fieldName) call inside CreateSearchResult with a single forward pass that collects string values directly from each field — reducing the work from O(N2) to O(N) per search result.

Focus Area

Code-Level Efficiency — eliminate provably redundant CPU work on the search result materialisation hot path.

Approach

The problem

CreateSearchResult iterates doc.Fields and, for each unique field name encountered, calls doc.GetValues(fieldName):

// Before
foreach (var field in fields)
{
    var fieldName = field.Name;
    if (!resultVals.ContainsKey(fieldName))
    {
        resultVals[fieldName] = doc.GetValues(fieldName).ToList();
    }
}

Document.GetValues(name) does a full linear scan over all stored fields to collect values for that name. With U unique field names and T total fields:

Operation Cost
ContainsKey per iteration U × O(1) = O(U)
GetValues per unique name U × O(T) = O(U × T)
ToList() per unique name U allocations of string[]List<string>

In the worst case (all field names unique) this is O(N2).

The fix

Collect string values in a single forward pass — visit each field exactly once:

// After
foreach (var field in fields)
{
    var strVal = field.GetStringValue();
    if (strVal == null) continue; // binary/numeric stored fields have no string value

    var fieldName = field.Name;
    if (!resultVals.TryGetValue(fieldName, out var list))
    {
        list = new List<string>(1);
        resultVals[fieldName] = list;
    }
    list.Add(strVal);
}

field.GetStringValue() is exactly what doc.GetValues() calls internally — no semantic change. Binary and numeric stored fields return null from GetStringValue() and were already excluded by GetValues; the continue preserves that behaviour.

Energy Efficiency Evidence

Proxy metric: CPU instructions + heap allocations per search result (both map directly to energy draw).

Scenario Before After
Field iterations T T (same)
GetValues inner scans U × T comparisons 0
LINQ ToList() calls U (each allocs string[] + List<string>) 0
List<string> initial capacity unbounded growth pre-sized to 1
Dictionary probes per iteration 2 (ContainsKey + indexer) 1 (TryGetValue)

For a typical CMS document with 10 unique stored fields (U=10, T=10):

  • Before: 10 GetValues calls × 10 comparisons each = 100 redundant comparisons + 20 extra heap objects (10 intermediate string[] + 10 List<string> copies)
  • After: 10 GetStringValue() calls = 0 redundant comparisons, 10 List<string> objects only

As field count grows (e.g. rich content with 30+ stored fields), the saving scales quadratically with the old code.

Green Software Foundation Context

Hardware Efficiency: eliminates provably unnecessary CPU work on every search result. The per-result saving is modest in isolation, but CreateSearchResult is called once per result in every search execution — a direct multiplier on search throughput and energy per query.

Software Carbon Intensity (SCI): reducing CPU time per functional unit (one search query) lowers the energy component of SCI.

Trade-offs

None. The change is semantically equivalent. Readability is preserved or improved — the single-pass loop is straightforward and the comment explains the rationale.

Reproducibility

dotnet build src/Examine.sln --configuration Release
dotnet test src/Examine.Test/Examine.Test.csproj -f net8.0

Test Status

✅ Build clean (0 errors, 3 pre-existing net6.0 EOL warnings). 148 tests passed (2 skipped as expected).

Generated by Efficiency Improver · 148.7 AIC · ⌖ 7.08 AIC · ⊞ 11.1K ·

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/efficiency-improver.md@d63b34de41bc0dc052096e094c732cf28eafc659

Replace the doc.GetValues(fieldName) call inside the field-iteration loop
with direct field.GetStringValue() access.

Before: for each unique field name encountered, doc.GetValues(fieldName) was
called. GetValues re-scans the entire field list looking for matching names,
making the total work O(unique_fields × total_fields) — O(N²) in the worst
case where all field names are unique.

After: collect string values in a single forward pass over doc.Fields. Each
field is visited exactly once; TryGetValue adds it to the result dictionary on
first encounter and appends on subsequent encounters. Total work is O(N).

Also pre-sizes each List<string> with capacity 1 (the common case of a single
value per field) to avoid internal array growth for most fields.

Proxy metric: memory allocations + CPU instructions per search result.
For a document with N unique stored fields, the change eliminates N−1
redundant GetValues passes (each O(N)), and removes N ToList() calls on the
intermediate string arrays returned by GetValues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Shazwazza Shazwazza marked this pull request as ready for review June 24, 2026 21:39
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the CreateSearchResult field collection loop with a single forward pass, eliminating the per-unique-field doc.GetValues() call that internally rescanned all stored fields on every invocation. A new unit test is added to document the retention of binary/numeric stored field keys in AllValues (as empty lists), which preserves backward-compatible behaviour.

  • LuceneSearchExecutor.cs: Iterates doc.Fields once, calls field.GetStringValue() directly per field, and uses TryGetValue+List.Add instead of ContainsKey+ToList() — reducing dictionary probes and heap allocations per result. Crucially, the key is inserted into resultVals before the null-string-value guard, so binary StoredField entries still appear in AllValues with an empty list, exactly as the old code produced.
  • LuceneSearchExecutorTests.cs: New test verifies a StoredField(byte[]) field is visible in AllValues with Count == 0, and that regular string fields are accessible through Values.

Confidence Score: 5/5

Safe to merge — the single-pass rewrite is semantically equivalent to the old code for all field types, and the new test explicitly covers the binary stored-field edge case.

Both old and new code produce identical resultVals dictionaries: string fields map to their collected string values, and binary/numeric fields map to empty lists because the key is inserted unconditionally before the GetStringValue() null check. The new test locks this in. No observable behaviour changes for callers of AllValues, Values, or GetValues.

No files require special attention.

Important Files Changed

Filename Overview
src/Examine.Lucene/Search/LuceneSearchExecutor.cs Single-pass field collection replaces per-unique-field GetValues() linear scans; correctly preserves binary/numeric field keys in the result dictionary (with empty lists) by inserting the key before the null-string-value guard, matching old behaviour exactly.
src/Examine.Test/Examine.Lucene/Search/LuceneSearchExecutorTests.cs New unit test directly exercises CreateSearchResult with a binary StoredField and asserts that the field key is present in AllValues with an empty list — explicitly guarding the edge-case behaviour flagged in prior review discussion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
    B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
    B -- hit --> D[reuse existing list]
    C --> E[strVal = field.GetStringValue]
    D --> E
    E -- strVal != null --> F[list.Add strVal]
    E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
    F --> H{more fields?}
    G --> H
    H -- yes --> A
    H -- no --> I[return resultVals]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
    B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
    B -- hit --> D[reuse existing list]
    C --> E[strVal = field.GetStringValue]
    D --> E
    E -- strVal != null --> F[list.Add strVal]
    E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
    F --> H{more fields?}
    G --> H
    H -- yes --> A
    H -- no --> I[return resultVals]
Loading

Reviews (3): Last reviewed commit: "Merge origin/support/3.x and resolve Luc..." | Re-trigger Greptile

Comment thread src/Examine.Lucene/Search/LuceneSearchExecutor.cs Outdated
@Shazwazza

Copy link
Copy Markdown
Owner

@copilot Check out Greptile's comments, assess if fixes are required and resolve the comments

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot AI commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@copilot Check out Greptile's comments, assess if fixes are required and resolve the comments

I reviewed Greptile’s feedback and applied a fix in 771dce3. CreateSearchResult now keeps field keys even when a stored field has no string value (e.g., binary), preserving AllValues key behavior while keeping the single-pass O(N) collection. I also added a regression test in LuceneSearchExecutorTests to lock this behavior.

Copilot AI requested a review from Shazwazza June 24, 2026 22:05
@Shazwazza

Copy link
Copy Markdown
Owner

@copilot resolve the merge conflicts in this pull request

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot AI commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Resolved by merging origin/support/3.x into this branch and fixing the conflict in LuceneSearchExecutor.cs in commit 316b534e.

@Shazwazza Shazwazza merged commit a47e024 into support/3.x Jun 25, 2026
1 check passed
@Shazwazza Shazwazza deleted the efficiency/single-pass-field-collection-d146e0c6987a8055 branch June 25, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants