Skip to content

Coordinate DocValues skip list evaluation across multiple fields in conjunction queries #15770

@sgup432

Description

@sgup432

Description

When a query combines multiple numeric range filters in a conjunction (e.g., price:[10,50] AND rating:[4,5] AND date:[start,end]), each field's DocValuesSkipper currently operates independently. Each field reads its own skip metadata and classifies blocks (4,096 docs each) as YES/NO/MAYBE without sharing information. The conjunction is resolved at the document level by ConjunctionDISI.

This means when one field determines a block has no matching documents (NO), the other fields still read and process their skip data for that same block, only for the conjunction to discard it later.

One example I can think of:

Consider block 0 (docs 0–4095) with query price:[10,50] AND rating:[4,5] AND date:[2024-01, 2024-06]:

Block 0 skip metadata:
  price:  min=70, max=95     → NO (all prices above 50)
  rating: min=2,  max=5      → MAYBE (overlaps [4,5])
  date:   min=Jan, max=Mar   → MAYBE (overlaps [Jan, Jun])

Without coordination (today):

  1. Lets say ConjunctionDISI picks rating as lead, then date as second lead, price being last (using cheapest cost estimate)
  2. rating reads skip data for block 0 → MAYBE
  3. rating starts per-doc evaluation: reads value for doc 0, doc 1, doc 2....so on
  4. Doc 3 has rating=4 → matches. rating returns doc 3
  5. date being asked to advance to doc 3
  6. date reads value for doc 3 → date matches. Returns doc 3. Both date and rating agrees on a doc.
  7. price then being asked to advance to doc 3, reads its skip data → NO (min=70 > 50)
  8. price jumps to block 1. Conjunction restarts
  9. Wasted: rating's per-doc reads on docs 0–3, date's value read on doc 3

With coordination:

Check all fields' skip metadata at block level first
Rating: MAYBE
Date: MAYBE
Price: NO → short-circuit
All fields jump past block 0. Saves the above wasted work ie rating and date per doc reads.

To solve this, we can introduce a coordinated multi-field skip evaluation that:

  • Sorts fields by selectivity (most selective first)
  • Evaluates skip metadata field-by-field, short-circuiting on the first NO
  • For YES blocks, skips per-doc evaluation entirely for those fields

This would an optimization purely on the search side, and no indexing changes would be required.

Hopefully this makes sense, feel free to correct my understanding here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions