Skip to content

Conversation

@rnewson
Copy link
Member

@rnewson rnewson commented Sep 26, 2023

Overview

for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

Testing recommendations

TBD

Related Issues or Pull Requests

#4775

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@rnewson
Copy link
Member Author

rnewson commented Sep 26, 2023

I've not done the text side yet, or any tests. just sounding out the idea.

for text I'd prefer to pass the regex through to Lucene (clouseau or nouveau) and document the variation in regex flavour (there's huge overlap). Omitting the optimizations Lucene makes for the sake of purity was a mistake in the original implementation imo.

for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

closes: #4775
@willholley
Copy link
Member

willholley commented Sep 26, 2023

I wonder whether a $startsWith operator would be cleaner, as we could then optimize it for text indexes specifically? The $regex operator originally did differ for text indexes iirc but we had users experience weirdness when they would add an index and suddenly get different results.

A general principal in Mango over the last ~5 years is that adding an index shouldn't change the result of a query implicitly, so I'd be wary of reintroducing that behaviour.

@rnewson
Copy link
Member Author

rnewson commented Sep 26, 2023

I like that. We could then convert the prefix of the regex to a startswith for views.

willholley added a commit that referenced this pull request Oct 17, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
@willholley willholley mentioned this pull request Oct 17, 2023
5 tasks
willholley added a commit that referenced this pull request Oct 17, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
willholley added a commit that referenced this pull request Oct 26, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
willholley added a commit that referenced this pull request Oct 30, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants