-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
When a query combines multiple numeric range filters in a conjunction (e.g., price:[10,50] AND rating:[4,5] AND date:[start,end]), each field's DocValuesSkipper currently operates independently. Each field reads its own skip metadata and classifies blocks (4,096 docs each) as YES/NO/MAYBE without sharing information. The conjunction is resolved at the document level by ConjunctionDISI.
This means when one field determines a block has no matching documents (NO), the other fields still read and process their skip data for that same block, only for the conjunction to discard it later.
One example I can think of:
Consider block 0 (docs 0–4095) with query price:[10,50] AND rating:[4,5] AND date:[2024-01, 2024-06]:
Block 0 skip metadata:
price: min=70, max=95 → NO (all prices above 50)
rating: min=2, max=5 → MAYBE (overlaps [4,5])
date: min=Jan, max=Mar → MAYBE (overlaps [Jan, Jun])
Without coordination (today):
- Lets say ConjunctionDISI picks rating as lead, then date as second lead, price being last (using cheapest cost estimate)
ratingreads skip data for block 0 → MAYBEratingstarts per-doc evaluation: reads value for doc 0, doc 1, doc 2....so on- Doc 3 has rating=4 → matches.
ratingreturns doc 3 datebeing asked to advance to doc 3datereads value for doc 3 → date matches. Returns doc 3. Bothdateandratingagrees on a doc.pricethen being asked to advance to doc 3, reads its skip data → NO (min=70 > 50)pricejumps to block 1. Conjunction restarts- Wasted: rating's per-doc reads on docs 0–3, date's value read on doc 3
With coordination:
Check all fields' skip metadata at block level first
Rating: MAYBE
Date: MAYBE
Price: NO → short-circuit
All fields jump past block 0. Saves the above wasted work ie rating and date per doc reads.
To solve this, we can introduce a coordinated multi-field skip evaluation that:
- Sorts fields by selectivity (most selective first)
- Evaluates skip metadata field-by-field, short-circuiting on the first NO
- For YES blocks, skips per-doc evaluation entirely for those fields
This would an optimization purely on the search side, and no indexing changes would be required.
Hopefully this makes sense, feel free to correct my understanding here.