[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N)) by github-actions[bot] · Pull Request #509 · Shazwazza/Examine

github-actions · 2026-06-24T20:37:48Z

🤖 This is an automated draft PR from Efficiency Improver, an AI assistant focused on reducing the energy consumption and computational footprint of this repository.

Goal

Replace the per-field doc.GetValues(fieldName) call inside CreateSearchResult with a single forward pass that collects string values directly from each field — reducing the work from O(N2) to O(N) per search result.

Focus Area

Code-Level Efficiency — eliminate provably redundant CPU work on the search result materialisation hot path.

Approach

The problem

CreateSearchResult iterates doc.Fields and, for each unique field name encountered, calls doc.GetValues(fieldName):

// Before
foreach (var field in fields)
{
    var fieldName = field.Name;
    if (!resultVals.ContainsKey(fieldName))
    {
        resultVals[fieldName] = doc.GetValues(fieldName).ToList();
    }
}

Document.GetValues(name) does a full linear scan over all stored fields to collect values for that name. With U unique field names and T total fields:

Operation	Cost
`ContainsKey` per iteration	U × O(1) = O(U)
`GetValues` per unique name	U × O(T) = O(U × T)
`ToList()` per unique name	U allocations of `string[]` → `List<string>`

In the worst case (all field names unique) this is O(N2).

The fix

Collect string values in a single forward pass — visit each field exactly once:

// After
foreach (var field in fields)
{
    var strVal = field.GetStringValue();
    if (strVal == null) continue; // binary/numeric stored fields have no string value

    var fieldName = field.Name;
    if (!resultVals.TryGetValue(fieldName, out var list))
    {
        list = new List<string>(1);
        resultVals[fieldName] = list;
    }
    list.Add(strVal);
}

field.GetStringValue() is exactly what doc.GetValues() calls internally — no semantic change. Binary and numeric stored fields return null from GetStringValue() and were already excluded by GetValues; the continue preserves that behaviour.

Energy Efficiency Evidence

Proxy metric: CPU instructions + heap allocations per search result (both map directly to energy draw).

Scenario	Before	After
Field iterations	T	T (same)
`GetValues` inner scans	U × T comparisons	0
LINQ `ToList()` calls	U (each allocs `string[]` + `List<string>`)	0
`List<string>` initial capacity	unbounded growth	pre-sized to 1
Dictionary probes per iteration	2 (`ContainsKey` + indexer)	1 (`TryGetValue`)

For a typical CMS document with 10 unique stored fields (U=10, T=10):

Before: 10 GetValues calls × 10 comparisons each = 100 redundant comparisons + 20 extra heap objects (10 intermediate string[] + 10 List<string> copies)
After: 10 GetStringValue() calls = 0 redundant comparisons, 10 List<string> objects only

As field count grows (e.g. rich content with 30+ stored fields), the saving scales quadratically with the old code.

Green Software Foundation Context

Hardware Efficiency: eliminates provably unnecessary CPU work on every search result. The per-result saving is modest in isolation, but CreateSearchResult is called once per result in every search execution — a direct multiplier on search throughput and energy per query.

Software Carbon Intensity (SCI): reducing CPU time per functional unit (one search query) lowers the energy component of SCI.

Trade-offs

None. The change is semantically equivalent. Readability is preserved or improved — the single-pass loop is straightforward and the comment explains the rationale.

Reproducibility

dotnet build src/Examine.sln --configuration Release
dotnet test src/Examine.Test/Examine.Test.csproj -f net8.0

Test Status

✅ Build clean (0 errors, 3 pre-existing net6.0 EOL warnings). 148 tests passed (2 skipped as expected).

Generated by Efficiency Improver · 148.7 AIC · ⌖ 7.08 AIC · ⊞ 11.1K · ◷

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/efficiency-improver.md@d63b34de41bc0dc052096e094c732cf28eafc659

Replace the doc.GetValues(fieldName) call inside the field-iteration loop with direct field.GetStringValue() access. Before: for each unique field name encountered, doc.GetValues(fieldName) was called. GetValues re-scans the entire field list looking for matching names, making the total work O(unique_fields × total_fields) — O(N²) in the worst case where all field names are unique. After: collect string values in a single forward pass over doc.Fields. Each field is visited exactly once; TryGetValue adds it to the result dictionary on first encounter and appends on subsequent encounters. Total work is O(N). Also pre-sizes each List<string> with capacity 1 (the common case of a single value per field) to avoid internal array growth for most fields. Proxy metric: memory allocations + CPU instructions per search result. For a document with N unique stored fields, the change eliminates N−1 redundant GetValues passes (each O(N)), and removes N ToList() calls on the intermediate string arrays returned by GetValues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

greptile-apps · 2026-06-24T21:47:44Z

Greptile Summary

This PR replaces the CreateSearchResult field collection loop with a single forward pass, eliminating the per-unique-field doc.GetValues() call that internally rescanned all stored fields on every invocation. A new unit test is added to document the retention of binary/numeric stored field keys in AllValues (as empty lists), which preserves backward-compatible behaviour.

LuceneSearchExecutor.cs: Iterates doc.Fields once, calls field.GetStringValue() directly per field, and uses TryGetValue+List.Add instead of ContainsKey+ToList() — reducing dictionary probes and heap allocations per result. Crucially, the key is inserted into resultVals before the null-string-value guard, so binary StoredField entries still appear in AllValues with an empty list, exactly as the old code produced.
LuceneSearchExecutorTests.cs: New test verifies a StoredField(byte[]) field is visible in AllValues with Count == 0, and that regular string fields are accessible through Values.

Confidence Score: 5/5

Safe to merge — the single-pass rewrite is semantically equivalent to the old code for all field types, and the new test explicitly covers the binary stored-field edge case.

Both old and new code produce identical resultVals dictionaries: string fields map to their collected string values, and binary/numeric fields map to empty lists because the key is inserted unconditionally before the GetStringValue() null check. The new test locks this in. No observable behaviour changes for callers of AllValues, Values, or GetValues.

No files require special attention.

Important Files Changed

Filename	Overview
src/Examine.Lucene/Search/LuceneSearchExecutor.cs	Single-pass field collection replaces per-unique-field GetValues() linear scans; correctly preserves binary/numeric field keys in the result dictionary (with empty lists) by inserting the key before the null-string-value guard, matching old behaviour exactly.
src/Examine.Test/Examine.Lucene/Search/LuceneSearchExecutorTests.cs	New unit test directly exercises CreateSearchResult with a binary StoredField and asserts that the field key is present in AllValues with an empty list — explicitly guarding the edge-case behaviour flagged in prior review discussion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
    B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
    B -- hit --> D[reuse existing list]
    C --> E[strVal = field.GetStringValue]
    D --> E
    E -- strVal != null --> F[list.Add strVal]
    E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
    F --> H{more fields?}
    G --> H
    H -- yes --> A
    H -- no --> I[return resultVals]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[foreach field in doc.Fields] --> B{resultVals.TryGetValue fieldName}
    B -- miss --> C[list = new List-string-1
resultVals fieldName = list]
    B -- hit --> D[reuse existing list]
    C --> E[strVal = field.GetStringValue]
    D --> E
    E -- strVal != null --> F[list.Add strVal]
    E -- strVal == null binary/numeric field --> G[skip Add
key already inserted with empty list]
    F --> H{more fields?}
    G --> H
    H -- yes --> A
    H -- no --> I[return resultVals]

_{Reviews (3): Last reviewed commit: "Merge origin/support/3.x and resolve Luc..." | Re-trigger Greptile}

Shazwazza · 2026-06-24T21:55:48Z

@copilot Check out Greptile's comments, assess if fixes are required and resolve the comments

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot · 2026-06-24T22:05:06Z

@copilot Check out Greptile's comments, assess if fixes are required and resolve the comments

I reviewed Greptile’s feedback and applied a fix in 771dce3. CreateSearchResult now keeps field keys even when a stored field has no string value (e.g., binary), preserving AllValues key behavior while keeping the single-pass O(N) collection. I also added a regression test in LuceneSearchExecutorTests to lock this behavior.

Shazwazza · 2026-06-24T22:07:05Z

@copilot resolve the merge conflicts in this pull request

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot · 2026-06-24T22:19:28Z

@copilot resolve the merge conflicts in this pull request

Resolved by merging origin/support/3.x into this branch and fixing the conflict in LuceneSearchExecutor.cs in commit 316b534e.

github-actions Bot added automation efficiency green-software labels Jun 24, 2026

Shazwazza marked this pull request as ready for review June 24, 2026 21:39

greptile-apps Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread src/Examine.Lucene/Search/LuceneSearchExecutor.cs Outdated

Copilot started work on behalf of Shazwazza June 24, 2026 21:56 View session

fix: preserve empty AllValues entries for non-string stored fields

771dce3

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot finished work on behalf of Shazwazza June 24, 2026 22:05

Copilot AI requested a review from Shazwazza June 24, 2026 22:05

Copilot started work on behalf of Shazwazza June 24, 2026 22:07 View session

Merge origin/support/3.x and resolve LuceneSearchExecutor conflict

316b534

Co-authored-by: Shazwazza <1742685+Shazwazza@users.noreply.github.com>

Copilot finished work on behalf of Shazwazza June 24, 2026 22:19

Shazwazza merged commit a47e024 into support/3.x Jun 25, 2026
1 check passed

Shazwazza deleted the efficiency/single-pass-field-collection-d146e0c6987a8055 branch June 25, 2026 18:11

github-actions Bot mentioned this pull request Jun 25, 2026

[efficiency-improver] Monthly Activity 2026-06 #510

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N))#509

[efficiency-improver] perf: single-pass field collection in CreateSearchResult (O(N2) → O(N))#509
Shazwazza merged 3 commits into
support/3.xfrom
efficiency/single-pass-field-collection-d146e0c6987a8055

github-actions Bot commented Jun 24, 2026

Uh oh!

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Shazwazza commented Jun 24, 2026

Uh oh!

Copilot AI commented Jun 24, 2026

Uh oh!

Shazwazza commented Jun 24, 2026

Uh oh!

Copilot AI commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

github-actions Bot commented Jun 24, 2026

Goal

Focus Area

Approach

The problem

The fix

Energy Efficiency Evidence

Green Software Foundation Context

Trade-offs

Reproducibility

Test Status

Uh oh!

greptile-apps Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Shazwazza commented Jun 24, 2026

Uh oh!

Copilot AI commented Jun 24, 2026

Uh oh!

Shazwazza commented Jun 24, 2026

Uh oh!

Copilot AI commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading