This document provides a detailed reference for all REST API endpoints supported by the Azure AI Search Simulator.
| Endpoint Category | Status | Notes |
|---|---|---|
| Index Operations | ✅ Implemented | Full CRUD support, statistics |
| Document Operations | ✅ Implemented | Upload, merge, mergeOrUpload, delete |
| Search | ✅ Implemented | Simple & Lucene syntax, vector search, hybrid search |
| Suggest/Autocomplete | ✅ Implemented | Basic prefix matching |
| Data Sources | ✅ Implemented | File system connector |
| Indexers | ✅ Implemented | Full CRUD, run, reset, status, scheduled execution |
| Document Cracking | ✅ Implemented | PDF, Word, Excel, HTML, JSON, CSV, TXT |
| Skillsets | ✅ Implemented | Text skills, embedding skill, custom Web API skill, index projections |
| Synonym Maps | ✅ Implemented | Full CRUD, Solr format, query-time expansion |
| Service Statistics | ✅ Implemented | Counters and limits (quotas use S1 defaults) |
https://localhost:7250Note: HTTPS is recommended for Azure SDK compatibility. HTTP is also available at
http://localhost:5250.
The simulator supports three authentication methods. See AUTHENTICATION.md for details.
api-key: admin-key-12345| Key Type | Default Value | Permissions |
|---|---|---|
| Admin Key | admin-key-12345 |
Full read/write access |
| Query Key | query-key-67890 |
Read-only search operations |
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...| Role | Permissions |
|---|---|
| Search Service Contributor | Manage indexes, indexers, data sources, skillsets |
| Search Index Data Contributor | Upload/merge/delete documents |
| Search Index Data Reader | Search, suggest, autocomplete |
GET /admin/token/quick/data-contributor
api-key: admin-key-12345Returns a JWT with the specified role for local testing.
Note: If both
api-keyandAuthorization: Bearerare present, the API key takes precedence (matching Azure AI Search behavior).
All requests require the api-version query parameter:
?api-version=2025-09-01
| Version | Status | Notes |
|---|---|---|
2025-09-01 |
✅ Supported | Latest stable - includes index description, debug subscores |
2024-07-01 |
✅ Supported | Previous stable - vector search, quantization |
2023-11-01 |
Vector search, semantic ranking basics |
Creates a new search index.
POST /indexes?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "hotels",
"fields": [
{
"name": "hotelId",
"type": "Edm.String",
"key": true,
"filterable": true
},
{
"name": "hotelName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"sortable": true
},
{
"name": "description",
"type": "Edm.String",
"searchable": true
},
{
"name": "category",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"facetable": true
},
{
"name": "rating",
"type": "Edm.Double",
"filterable": true,
"sortable": true,
"facetable": true
},
{
"name": "tags",
"type": "Collection(Edm.String)",
"searchable": true,
"filterable": true,
"facetable": true
},
{
"name": "descriptionVector",
"type": "Collection(Edm.Single)",
"searchable": true,
"dimensions": 1536,
"vectorSearchProfile": "my-vector-profile"
},
{
"name": "address",
"type": "Edm.ComplexType",
"fields": [
{
"name": "streetAddress",
"type": "Edm.String",
"searchable": true
},
{
"name": "city",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"facetable": true
}
]
}
],
"vectorSearch": {
"algorithms": [
{
"name": "my-hnsw",
"kind": "hnsw",
"hnswParameters": {
"metric": "cosine"
}
}
],
"profiles": [
{
"name": "my-vector-profile",
"algorithm": "my-hnsw"
}
]
},
"scoringProfiles": [
{
"name": "boostRating",
"text": {
"weights": { "hotelName": 3, "description": 1 }
},
"functions": [
{
"type": "magnitude",
"fieldName": "rating",
"boost": 5,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 0,
"boostingRangeEnd": 5
}
}
],
"functionAggregation": "sum"
}
],
"defaultScoringProfile": "boostRating",
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
"k1": 1.2,
"b": 0.75
},
"suggesters": [
{
"name": "sg",
"searchMode": "analyzingInfixMatching",
"sourceFields": ["hotelName", "category"]
}
]
}Response: 201 Created
GET /indexes?api-version=2024-07-01
api-key: <admin-key>Response:
{
"@odata.context": "https://localhost:7001/$metadata#indexes",
"value": [
{
"name": "hotels",
"fields": [...],
"@odata.etag": "\"0x12345\""
}
]
}GET /indexes/{indexName}?api-version=2024-07-01
api-key: <admin-key>DELETE /indexes/{indexName}?api-version=2024-07-01
api-key: <admin-key>Response: 204 No Content
Creates a new index or updates an existing one.
PUT /indexes/{indexName}?api-version=2024-07-01&allowIndexDowntime=true
Content-Type: application/json
api-key: <admin-key>Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
allowIndexDowntime |
bool | false |
When true, allows updates that require the index to be temporarily offline. Required for updating BM25 similarity parameters (k1, b) on an existing index. |
Restrictions on update:
- The similarity algorithm
@odata.typecannot be changed on an existing index (returns400 Bad Request). - BM25 parameters (
k1,b) can be updated only whenallowIndexDowntime=true. - Fields can be added but existing fields cannot be removed.
Returns statistics for a search index including document count and storage size.
GET /indexes/{indexName}/stats?api-version=2024-07-01
api-key: <admin-key>Response:
{
"@odata.context": "https://localhost:7250/$metadata#Microsoft.Azure.Search.V2024_07_01.IndexStatistics",
"documentCount": 153951,
"storageSize": 274189410,
"vectorIndexSize": 0
}| Field | Description |
|---|---|
documentCount |
Number of documents in the index |
storageSize |
Size of the Lucene index storage in bytes |
vectorIndexSize |
Size of the HNSW vector index storage in bytes |
POST /indexes/{indexName}/docs/index?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"value": [
{
"@search.action": "upload",
"hotelId": "1",
"hotelName": "Secret Point Motel",
"description": "A great hotel",
"category": "Budget",
"rating": 4.5,
"tags": ["pool", "wifi"]
},
{
"@search.action": "mergeOrUpload",
"hotelId": "2",
"hotelName": "Twin Dome Motel",
"category": "Budget"
},
{
"@search.action": "delete",
"hotelId": "3"
}
]
}Actions:
upload- Insert new document (fails if exists)merge- Update existing document fieldsmergeOrUpload- Update if exists, otherwise insertdelete- Remove document
Response:
{
"value": [
{
"key": "1",
"status": true,
"errorMessage": null,
"statusCode": 201
},
{
"key": "2",
"status": true,
"errorMessage": null,
"statusCode": 200
}
]
}GET /indexes/{indexName}/docs/{key}?api-version=2024-07-01
api-key: <query-key>GET /indexes/{indexName}/docs/$count?api-version=2024-07-01
api-key: <query-key>Response: 1234 (plain text number)
POST /indexes/{indexName}/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: <query-key>Request Body:
{
"search": "luxury hotel pool",
"searchMode": "all",
"queryType": "simple",
"searchFields": "hotelName,description",
"select": "hotelId,hotelName,rating,category",
"filter": "rating ge 4",
"orderby": "rating desc",
"top": 10,
"skip": 0,
"count": true,
"facets": ["category,count:5", "rating,interval:1"],
"highlight": "description",
"highlightPreTag": "<em>",
"highlightPostTag": "</em>"
}Query Parameters:
| Parameter | Type | Description |
|---|---|---|
search |
string | Search text (use * for all documents) |
searchMode |
string | any (default) or all |
queryType |
string | simple (default) or full |
searchFields |
string | Comma-separated field names |
select |
string | Fields to return |
filter |
string | OData filter expression |
orderby |
string | Sort expression |
top |
integer | Number of results (max 1000) |
skip |
integer | Results to skip |
count |
boolean | Include total count |
facets |
array | Facet specifications |
highlight |
string | Fields to highlight |
scoringProfile |
string | Name of a scoring profile to evaluate (overrides defaultScoringProfile) |
scoringParameters |
array | Values for scoring functions, e.g. ["tagParam-luxury,boutique"] |
scoringStatistics |
string | "local" (default) or "global". Accepted for compatibility; simulator always uses local statistics |
vectorQueries |
array | Vector query objects (see below) |
debug |
string | Debug mode for search diagnostics (see below) |
featuresMode |
string | When "enabled", returns per-field BM25 scoring features in @search.features (see below) |
The featuresMode parameter provides per-field BM25 scoring breakdown for each search result. This is useful for understanding why certain documents rank higher or lower and how different fields contribute to the overall score.
Supported values:
| Value | Description |
|---|---|
"none" |
No feature-level scoring details (default) |
"enabled" |
Returns detailed scoring breakdown per searchable field |
When enabled, each result includes an @search.features object with entries for each matching searchable field:
{
"@search.score": 3.0860271,
"@search.features": {
"description": {
"uniqueTokenMatches": 2.0,
"similarityScore": 3.0860272,
"termFrequency": 2.0
},
"tags": {
"uniqueTokenMatches": 1.0,
"similarityScore": 1.1271671,
"termFrequency": 1.0
}
}
}uniqueTokenMatches: Number of unique search terms found in the fieldsimilarityScore: BM25 similarity score for this field against the querytermFrequency: Total number of times the search terms appear in the field
Note: Only fields where at least one search term matches are included. Use
searchFieldsto restrict which fields are evaluated.
The debug parameter enables diagnostic information in the search response. It returns detailed information about how results were scored and ranked.
Note: In Azure AI Search, the
debugparameter was introduced in API version2025-05-01-preview. The simulator supports it on all API versions for convenience.
Supported values:
| Value | Description |
|---|---|
disabled |
No debug info (default) |
semantic |
Debug info for semantic ranking |
vector |
Debug info for vector/hybrid search subscores |
queryRewrites |
Debug info for query rewrites |
innerHits |
Debug info for inner hits in complex types |
all |
All debug info |
Multiple modes can be combined with |, e.g. "semantic|vector".
Example request with debug:
{
"search": "luxury hotel",
"vectorQueries": [
{
"kind": "vector",
"vector": [0.1, 0.2, 0.3],
"fields": "contentVector",
"k": 5
}
],
"debug": "vector"
}When debug is enabled, the response includes:
@search.debug(response-level): Query-level debug info including parsed queries, timing, and simulator-specific diagnostics.@search.documentDebugInfo(per-document): Breakdown of subscores per document, including text BM25 scores and vector similarity scores per field.
Debug response example:
{
"@search.debug": {
"queryRewrites": null,
"simulator.parsedQuery": "+title:luxury +title:hotel",
"simulator.parsedFilter": null,
"simulator.isHybridSearch": true,
"simulator.textSearchTimeMs": 5.2,
"simulator.vectorSearchTimeMs": 3.1,
"simulator.totalTimeMs": 12.5,
"simulator.textMatchCount": 15,
"simulator.vectorMatchCount": 10,
"simulator.scoreFusionMethod": "WeightedAverage",
"simulator.searchableFields": ["title", "description"]
},
"value": [
{
"@search.score": 0.85,
"@search.documentDebugInfo": {
"vectors": {
"subscores": {
"text": { "searchScore": 3.14 },
"documentBoost": 1.0,
"vectors": {
"contentVector": {
"searchScore": 0.85,
"vectorSimilarity": 0.92
}
}
}
}
},
"id": "1",
"title": "Grand Luxury Hotel"
}
]
}Note: Properties prefixed with
simulator.are specific to this simulator and are not present in the official Azure AI Search API. The@search.documentDebugInfoand@search.debug.queryRewritesstructures match the official API.
Add vectorQueries to perform vector or hybrid search:
{
"search": "luxury hotels",
"vectorQueries": [
{
"kind": "vector",
"vector": [0.01, 0.02, ...],
"fields": "descriptionVector",
"k": 10
}
]
}| Parameter | Type | Description |
|---|---|---|
kind |
string | "vector" for raw vector input |
vector |
array | Array of floats (embedding) |
fields |
string | Vector field name(s) |
k |
integer | Number of nearest neighbors |
Hybrid Search: Include both search (text) and vectorQueries (vector) to combine results.
The simulator uses HNSW (Hierarchical Navigable Small World) for efficient approximate nearest neighbor search:
- O(log n) query time instead of O(n) brute-force
- High recall (typically 95-99% accuracy)
- Configurable parameters (M, efConstruction, efSearch)
- Automatic fallback to brute-force when HNSW is disabled
Vector Search Architecture:
The simulator implements a dual-layer vector search system:
┌─────────────────────────────────────────────────────────────┐
│ IVectorSearchService │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ HnswIndexManager │ │ VectorStore │ │
│ │ (HNSW algorithm) │ │ (Brute-force fallback) │ │
│ └─────────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Filtered Vector Search:
When filters are applied, the simulator uses post-filtering with oversampling:
- Retrieve
k × oversampleMultipliercandidates from HNSW - Filter candidates to those matching the OData filter
- Return top-k filtered results
- Fall back to brute-force if insufficient results
This ensures good recall even with selective filters.
Score Calculation:
Vector search results include both distance and score:
- Distance: Raw distance from HNSW (lower = closer)
- Score: Similarity score computed as
1 / (1 + distance)(0-1 range, higher = more similar)
Performance Considerations:
| Use Case | Recommended Settings |
|---|---|
| Development | M=16, efConstruction=100, efSearch=50 |
| Balanced | M=16, efConstruction=200, efSearch=100 |
| High Recall | M=32, efConstruction=400, efSearch=200 |
Disabling HNSW:
To use brute-force cosine similarity instead of HNSW:
{
"VectorSearchSettings": {
"UseHnsw": false
}
}When combining text and vector search results, the simulator supports two fusion methods:
Reciprocal Rank Fusion (RRF) (default)
RRF combines results based on their ranks rather than scores:
RRF_score(d) = Σ 1 / (k + rank(d))
Where:
kis a constant (default: 60) that controls rank-score distributionrank(d)is the document's position in each result list (1-indexed)- Documents appearing in both text and vector results get higher scores
Benefits of RRF:
- No score normalization needed
- Works well when score distributions differ
- Documents in both result sets are boosted
- Simple and robust
Weighted Score Fusion
Alternatively, combine normalized scores with configurable weights:
final_score = (text_weight × norm_text_score) + (vector_weight × norm_vector_score)
Text scores are normalized using min-max normalization. Vector scores are already in 0-1 range.
Default weights:
vectorWeight: 0.7 (semantic similarity prioritized)textWeight: 0.3 (keyword matches)
Configuration:
{
"VectorSearchSettings": {
"HybridSearchSettings": {
"DefaultFusionMethod": "RRF",
"RrfK": 60,
"DefaultVectorWeight": 0.7,
"DefaultTextWeight": 0.3
}
}
}Fusion Method Selection:
| Method | Best For | Notes |
|---|---|---|
| RRF | General use | Robust, no tuning needed |
| Weighted | Score transparency | Requires weight tuning |
The similarity algorithm controls how text search relevance scores are computed. Configure it on the index definition.
Supported algorithms:
| Algorithm | @odata.type |
Description |
|---|---|---|
| BM25Similarity (default) | #Microsoft.Azure.Search.BM25Similarity |
Okapi BM25 with tunable k1 and b parameters. Scores are unbounded. |
| ClassicSimilarity | #Microsoft.Azure.Search.ClassicSimilarity |
Legacy TF-IDF scoring. Scores are in the 0–1 range. No tunable parameters. |
BM25 parameters:
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
k1 |
double | 1.2 |
≥ 0 (no upper bound) | Controls term frequency saturation. 0 = binary match, higher values increase impact of repeated terms. |
b |
double | 0.75 |
0.0 – 1.0 | Controls document length normalization. 0 = no normalization, 1 = fully normalized. |
Index definition example:
{
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
"k1": 1.5,
"b": 0.5
}
}ClassicSimilarity example:
{
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.ClassicSimilarity"
}
}Key rules:
- If
similarityis null or omitted, BM25 with default parameters is used. - The
@odata.typeis immutable after index creation — attempting to change it returns400 Bad Request. - BM25 parameters (
k1,b) can be updated via Create-or-Update withallowIndexDowntime=true. - ClassicSimilarity does not accept
k1orbparameters. - Invalid parameter ranges (negative
k1,boutside 0–1) return400 Bad Request.
Scoring profiles boost document relevance based on field values. Define profiles in the index, then activate them via defaultScoringProfile or the scoringProfile search parameter.
Index definition example:
{
"scoringProfiles": [
{
"name": "boostByRating",
"text": {
"weights": {
"hotelName": 3,
"description": 1
}
},
"functions": [
{
"type": "magnitude",
"fieldName": "rating",
"boost": 5,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 0,
"boostingRangeEnd": 5,
"constantBoostBeyondRange": true
}
}
],
"functionAggregation": "sum"
}
],
"defaultScoringProfile": "boostByRating"
}Supported scoring function types:
| Function | Field Type | Description |
|---|---|---|
freshness |
Edm.DateTimeOffset |
Boost based on recency; decays over boostingDuration (ISO 8601) |
magnitude |
Edm.Double, Edm.Int32, Edm.Int64 |
Boost within a numeric range |
distance |
Edm.GeographyPoint |
Boost by proximity to a reference point (Haversine) |
tag |
Collection(Edm.String), Edm.String |
Boost when field values match scoring parameter tags |
Interpolation modes: linear (default), constant, quadratic, logarithmic
Note: Tag functions only support
linearandconstantinterpolation.
Aggregation modes: sum (default), average, minimum, maximum, firstMatching
Search request with scoring profile:
{
"search": "luxury hotel",
"scoringProfile": "boostByRating",
"scoringParameters": [
"tagParam-luxury,boutique"
]
}The scoringParameters array provides values for tag and distance functions. Format for tag: paramName-value1,value2. Format for distance: paramName--longitude,latitude (note the double dash separator).
When debug is enabled, the @search.documentDebugInfo includes a documentBoost value reflecting the combined scoring profile boost applied to each document.
Validation:
- Profiles with invalid field references, unsupported field types for functions, or non-filterable fields are rejected at index creation time.
- Function
boostmust be non-zero and not equal to1.0. Negative values are allowed (to demote documents). - Tag functions only accept
linearorconstantinterpolation. - Maximum 100 scoring profiles per index.
- Requesting a non-existent
scoringProfilein a search returns400 Bad Request.
Response:
{
"@odata.context": "...",
"@odata.count": 42,
"@search.facets": {
"category": [
{ "value": "Luxury", "count": 15 },
{ "value": "Budget", "count": 27 }
],
"rating": [
{ "from": 4, "to": 5, "count": 30 }
]
},
"value": [
{
"@search.score": 1.234,
"@search.highlights": {
"description": ["A <em>luxury</em> <em>hotel</em> with <em>pool</em>"]
},
"hotelId": "1",
"hotelName": "Grand Hotel",
"rating": 4.8,
"category": "Luxury"
}
]
}GET /indexes/{indexName}/docs?api-version=2024-07-01&search={text}&$filter={filter}&$select={fields}&$orderby={sort}&$top={n}&$skip={n}&$count={bool}&highlight={fields}&searchMode={mode}&queryType={type}&scoringProfile={name}&scoringParameter={param}&scoringStatistics={scope}&debug={mode}
api-key: <query-key>All search parameters can be passed as query string parameters. Use scoringProfile for the profile name and scoringParameter (repeated) for each scoring parameter value. The debug parameter accepts the same values as in the POST body.
Example:
GET /indexes/hotels/docs?api-version=2024-07-01&search=luxury&debug=all
api-key: <query-key>POST /indexes/{indexName}/docs/suggest?api-version=2024-07-01
Content-Type: application/json
api-key: <query-key>Request Body:
{
"search": "sea",
"suggesterName": "sg",
"select": "hotelId,hotelName",
"top": 5,
"fuzzy": true
}Response:
{
"value": [
{
"@search.text": "Seaside Resort",
"hotelId": "5",
"hotelName": "Seaside Resort"
}
]
}POST /indexes/{indexName}/docs/autocomplete?api-version=2024-07-01
Content-Type: application/json
api-key: <query-key>Request Body:
{
"search": "sea",
"suggesterName": "sg",
"autocompleteMode": "twoTerms",
"fuzzy": true
}POST /indexers?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "hotel-indexer",
"dataSourceName": "hotel-datasource",
"targetIndexName": "hotels",
"skillsetName": "hotel-skillset",
"schedule": {
"interval": "PT1H",
"startTime": "2024-01-01T00:00:00Z"
},
"parameters": {
"configuration": {
"parsingMode": "default",
"dataToExtract": "contentAndMetadata"
}
},
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "hotelId",
"mappingFunction": {
"name": "base64Encode"
}
}
],
"outputFieldMappings": [
{
"sourceFieldName": "/document/content",
"targetFieldName": "description"
}
]
}POST /indexers/{indexerName}/run?api-version=2024-07-01
api-key: <admin-key>Response: 202 Accepted
GET /indexers/{indexerName}/status?api-version=2024-07-01
api-key: <admin-key>Response:
{
"name": "hotel-indexer",
"status": "running",
"lastResult": {
"status": "success",
"itemsProcessed": 100,
"itemsFailed": 0,
"startTime": "2024-01-15T10:00:00Z",
"endTime": "2024-01-15T10:05:00Z"
},
"executionHistory": [...]
}Resets the change tracking state, causing a full re-index on next run.
POST /indexers/{indexerName}/reset?api-version=2024-07-01
api-key: <admin-key>POST /datasources?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "hotel-datasource",
"type": "azureblob",
"credentials": {
"connectionString": "DefaultEndpointsProtocol=file;LocalPath=./data/hotels"
},
"container": {
"name": "documents",
"query": "pdfs/"
},
"dataDeletionDetectionPolicy": {
"@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName": "IsDeleted",
"softDeleteMarkerValue": "true"
}
}Note: The simulator uses a special connection string format for local files:
DefaultEndpointsProtocol=file;LocalPath=<path>- Maps to local file system
The simulator includes built-in document cracking capabilities to extract text and metadata from various file formats. This functionality is automatically invoked by indexers when processing documents from data sources.
| Format | Extension(s) | Library | Features |
|---|---|---|---|
| Plain Text | .txt, .md |
Built-in | UTF-8/UTF-16 encoding detection |
| JSON | .json |
Built-in | Extracts all string values, metadata fields |
| CSV/TSV | .csv, .tsv |
Built-in | Auto delimiter detection, row/column extraction |
| HTML | .html, .htm |
HtmlAgilityPack | Tag stripping, meta tag extraction |
.pdf |
PdfPig | Page-by-page extraction, document properties | |
| Word | .docx |
OpenXML | Paragraphs, tables, document properties |
| Excel | .xlsx |
OpenXML | All sheets, shared strings |
When cracking documents, the following metadata is automatically extracted when available:
{
"content": "The extracted text content...",
"metadata_title": "Document Title",
"metadata_author": "Author Name",
"metadata_creation_date": "2024-01-15T10:30:00Z",
"metadata_last_modified": "2024-01-20T14:45:00Z",
"metadata_page_count": 5,
"metadata_word_count": 1250,
"metadata_character_count": 7500,
"metadata_language": "en"
}The cracker is selected based on:
- Content-Type header (when available from the data source)
- File extension (fallback)
| Content Type | Cracker Used |
|---|---|
text/plain |
PlainTextCracker |
text/markdown |
PlainTextCracker |
application/json |
JsonCracker |
text/csv |
CsvCracker |
text/tab-separated-values |
CsvCracker |
text/html |
HtmlCracker |
application/pdf |
PdfCracker |
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
WordDocCracker |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
ExcelCracker |
- PDF: Text extraction quality depends on PDF structure; scanned PDFs without OCR layer won't extract text
- Word/Excel: Only
.docx/.xlsx(Open XML) formats supported, not legacy.doc/.xls - Encoding: Plain text files default to UTF-8 if BOM not present
- Large files: No streaming; entire file is loaded into memory
Skillsets define a sequence of skills that transform and enrich documents during indexing.
POST /skillsets?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "hotel-skillset",
"description": "Extract content and split into chunks",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "split-skill",
"description": "Split text into pages",
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
],
"textSplitMode": "pages",
"maximumPageLength": 2000
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "merge-skill",
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "itemsToInsert",
"source": "/document/metadata_title"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "fullContent"
}
],
"insertPreTag": " ",
"insertPostTag": " "
},
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "shaper-skill",
"context": "/document",
"inputs": [
{
"name": "title",
"source": "/document/metadata_title"
},
{
"name": "content",
"source": "/document/content"
}
],
"outputs": [
{
"name": "output",
"targetName": "documentInfo"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "custom-skill",
"context": "/document",
"uri": "https://my-function.azurewebsites.net/api/Translate",
"httpMethod": "POST",
"timeout": "PT30S",
"batchSize": 10,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "translatedText",
"targetName": "translatedContent"
}
],
"httpHeaders": {
"x-functions-key": "your-function-key"
}
},
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "embedding-skill",
"context": "/document",
"resourceUri": "https://your-openai.openai.azure.com",
"deploymentId": "text-embedding-ada-002",
"modelName": "text-embedding-ada-002",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "embedding",
"targetName": "contentVector"
}
]
}
]
}Response: 201 Created
GET /skillsets/{skillsetName}?api-version=2024-07-01
api-key: <admin-key>Response: 200 OK with skillset definition
GET /skillsets?api-version=2024-07-01
api-key: <admin-key>Response:
{
"value": [
{ "name": "skillset-1", ... },
{ "name": "skillset-2", ... }
]
}PUT /skillsets/{skillsetName}?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>DELETE /skillsets/{skillsetName}?api-version=2024-07-01
api-key: <admin-key>Response: 204 No Content
| Skill Type | Description | Status |
|---|---|---|
#Microsoft.Skills.Text.SplitSkill |
Split text into pages or sentences | ✅ |
#Microsoft.Skills.Text.MergeSkill |
Merge text fragments | ✅ |
#Microsoft.Skills.Util.ShaperSkill |
Restructure data | ✅ |
#Microsoft.Skills.Util.ConditionalSkill |
Conditional output | ✅ |
#Microsoft.Skills.Custom.WebApiSkill |
Call external REST API | ✅ |
#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill |
Generate embeddings | ✅ |
To use a skillset with an indexer, specify the skillsetName and outputFieldMappings:
{
"name": "my-indexer",
"dataSourceName": "my-datasource",
"targetIndexName": "my-index",
"skillsetName": "my-skillset",
"outputFieldMappings": [
{
"sourceFieldName": "/document/contentVector",
"targetFieldName": "embedding"
},
{
"sourceFieldName": "/document/pages",
"targetFieldName": "chunks"
}
]
}Skillsets support an indexProjections property that enables one-to-many indexing — fanning out enriched child elements (e.g., chunks) into separate search documents in a secondary index.
This is useful when a skill such as TextSplitSkill produces an array of chunks and each chunk should become its own searchable document.
{
"name": "chunking-skillset",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "split-into-chunks",
"context": "/document",
"textSplitMode": "pages",
"maximumPageLength": 500,
"inputs": [
{ "name": "text", "source": "/document/content" }
],
"outputs": [
{ "name": "textItems", "targetName": "chunks" }
]
}
],
"indexProjections": {
"selectors": [
{
"targetIndexName": "chunks-index",
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/chunks/*",
"mappings": [
{ "name": "chunk_content", "source": "/document/chunks/*" },
{ "name": "title", "source": "/document/metadata_storage_name" }
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
}
}Key properties:
| Property | Description |
|---|---|
selectors[].targetIndexName |
The secondary index to receive projected documents |
selectors[].parentKeyFieldName |
Field in the child document that stores the parent document key |
selectors[].sourceContext |
Enrichment path with wildcard (e.g., /document/chunks/*) that determines fan-out |
selectors[].mappings[] |
Field mappings using name (target field) and source (enrichment path) |
parameters.projectionMode |
"skipIndexingParentDocuments" or "includeIndexingParentDocuments" (default) |
Projection modes:
skipIndexingParentDocuments— Only projected child documents are indexed. The parent document is not sent to any index.includeIndexingParentDocuments(default) — Both the parent document (to the indexer'stargetIndexName) and the child documents (to each selector'stargetIndexName) are indexed.
Projected key format: Each child document receives a key in the format {parentKey}_{contextSegment}_{index} (e.g., doc1_chunks_0, doc1_chunks_2).
See index-projection-sample.http for a complete walkthrough.
To use the Azure OpenAI Embedding Skill, configure the API key in appsettings.json:
{
"AzureOpenAI": {
"ApiKey": "your-azure-openai-api-key"
}
}The skill's resourceUri and deploymentId are specified in the skill definition itself.
Synonym maps define synonym rules that expand search queries at query time. Fields can reference synonym maps via the synonymMaps property. Only the Apache Solr synonym format is supported.
POST /synonymmaps?api-version=2024-07-01
Content-Type: application/json
api-key: your-admin-key
{
"name": "my-synonym-map",
"format": "solr",
"synonyms": "usa, united states, america\nautomobile => car, vehicle"
}Response: 201 Created with the synonym map definition including @odata.etag.
GET /synonymmaps/{synonymMapName}?api-version=2024-07-01
api-key: your-admin-keyResponse: 200 OK with the synonym map definition.
GET /synonymmaps?api-version=2024-07-01
api-key: your-admin-keyResponse: 200 OK with { "value": [...] }.
PUT /synonymmaps/{synonymMapName}?api-version=2024-07-01
Content-Type: application/json
api-key: your-admin-key
{
"name": "my-synonym-map",
"format": "solr",
"synonyms": "usa, united states, america\nautomobile => car, vehicle"
}Response: 200 OK (updated) or 201 Created (new).
DELETE /synonymmaps/{synonymMapName}?api-version=2024-07-01
api-key: your-admin-keyResponse: 204 No Content.
| Format | Example | Behavior |
|---|---|---|
| Equivalent | usa, united states, america |
Bidirectional: searching any term finds documents with any of the others |
| Explicit mapping | automobile => car, vehicle |
Unidirectional: searching "automobile" also finds "car" and "vehicle", but not vice versa |
Lines starting with # are treated as comments. Each rule is on a separate line.
To enable synonym expansion on a field, reference the synonym map in the field's synonymMaps property when creating or updating an index:
{
"name": "my-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true },
{
"name": "description",
"type": "Edm.String",
"searchable": true,
"synonymMaps": ["my-synonym-map"]
}
]
}When a search query matches a term in the synonym map on a field with synonymMaps configured, the query is automatically expanded with the synonym terms.
| Type | Description | Example |
|---|---|---|
Edm.String |
Text/string | "hello world" |
Edm.Int32 |
32-bit integer | 42 |
Edm.Int64 |
64-bit integer | 9223372036854775807 |
Edm.Double |
Double-precision float | 3.14159 |
Edm.Boolean |
True/false | true |
Edm.DateTimeOffset |
Date and time | "2024-01-15T10:30:00Z" |
Edm.GeographyPoint |
Lat/long coordinates | {"type":"Point","coordinates":[-122.131577,47.678581]} |
Edm.ComplexType |
Nested object | {"street":"123 Main St","city":"Seattle"} |
Collection(Edm.String) |
Array of strings | ["tag1","tag2"] |
Collection(Edm.Single) |
Vector embeddings | [0.01, 0.02, ..., 0.99] |
Collection(Edm.*) |
Array of any type | [1, 2, 3] |
| Operator | Description | Example |
|---|---|---|
eq |
Equal | rating eq 5 |
ne |
Not equal | category ne 'Budget' |
gt |
Greater than | rating gt 4 |
ge |
Greater than or equal | rating ge 4 |
lt |
Less than | rating lt 3 |
le |
Less than or equal | rating le 3 |
| Operator | Example |
|---|---|
and |
rating ge 4 and category eq 'Luxury' |
or |
category eq 'Budget' or category eq 'Economy' |
not |
not (rating lt 4) |
| Function | Example |
|---|---|
search.ismatch() |
search.ismatch('pool', 'description') |
search.in() |
search.in(category, 'Budget,Economy') |
geo.distance() |
geo.distance(location, geography'POINT(-122.13 47.67)') le 10 |
| Function | Example |
|---|---|
any() |
tags/any(t: t eq 'wifi') |
all() |
tags/all(t: t ne 'casino') |
Creates a new data source.
POST /datasources?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "local-files",
"type": "filesystem",
"credentials": {
"connectionString": "c:\\data\\documents"
},
"container": {
"name": "subfolder",
"query": "*.txt"
}
}Response: 201 Created with the created data source.
PUT /datasources/{dataSourceName}?api-version=2024-07-01Response: 200 OK (update) or 201 Created (create).
GET /datasources/{dataSourceName}?api-version=2024-07-01
api-key: <admin-key>Response:
{
"name": "local-files",
"type": "filesystem",
"credentials": {
"connectionString": "c:\\data\\documents"
},
"container": {
"name": "subfolder"
},
"@odata.etag": "\"abc123\""
}GET /datasources?api-version=2024-07-01
api-key: <admin-key>Response:
{
"value": [
{
"name": "local-files",
"type": "filesystem",
"container": {
"name": "documents"
}
}
]
}DELETE /datasources/{dataSourceName}?api-version=2024-07-01
api-key: <admin-key>Response: 204 No Content
| Type | Description | Authentication |
|---|---|---|
filesystem |
Local file system (simulator-only) | Local path |
azureblob |
Azure Blob Storage | Connection string, Account Key, SAS, Managed Identity |
adlsgen2 |
Azure Data Lake Storage Gen2 | Connection string, Account Key, SAS, Managed Identity |
Local File System:
{
"name": "local-files",
"type": "filesystem",
"credentials": {
"connectionString": "C:/data"
},
"container": {
"name": "documents",
"query": "*.pdf"
}
}Azure Blob Storage (Connection String):
{
"name": "blob-datasource",
"type": "azureblob",
"credentials": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=...;EndpointSuffix=core.windows.net"
},
"container": {
"name": "documents",
"query": "folder1/"
}
}Azure Blob Storage (Managed Identity):
{
"name": "blob-managed-identity",
"type": "azureblob",
"credentials": {
"connectionString": "https://mystorageaccount.blob.core.windows.net"
},
"container": {
"name": "documents"
}
}ADLS Gen2 (Connection String):
{
"name": "adls-datasource",
"type": "adlsgen2",
"credentials": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=mydatalake;AccountKey=...;EndpointSuffix=core.windows.net"
},
"container": {
"name": "filesystem1",
"query": "data/raw/"
}
}ADLS Gen2 (Managed Identity with DFS endpoint):
{
"name": "adls-managed-identity",
"type": "adlsgen2",
"credentials": {
"connectionString": "https://mydatalake.dfs.core.windows.net"
},
"container": {
"name": "filesystem1"
}
}Creates a new indexer.
POST /indexers?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"name": "my-indexer",
"dataSourceName": "local-files",
"targetIndexName": "documents-index",
"schedule": {
"interval": "PT1H"
},
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "id",
"mappingFunction": {
"name": "base64Encode"
}
}
],
"parameters": {
"batchSize": 100,
"maxFailedItems": 10,
"configuration": {
"parsingMode": "default",
"dataToExtract": "contentAndMetadata"
}
}
}Response: 201 Created with the created indexer.
PUT /indexers/{indexerName}?api-version=2024-07-01GET /indexers/{indexerName}?api-version=2024-07-01
api-key: <admin-key>GET /indexers?api-version=2024-07-01
api-key: <admin-key>DELETE /indexers/{indexerName}?api-version=2024-07-01
api-key: <admin-key>Response: 204 No Content
Triggers an immediate indexer run.
POST /indexers/{indexerName}/run?api-version=2024-07-01
api-key: <admin-key>Response: 202 Accepted
Resets the indexer tracking state, causing a full reindex on next run.
POST /indexers/{indexerName}/reset?api-version=2024-07-01
api-key: <admin-key>Response: 204 No Content
Gets the current status and execution history.
GET /indexers/{indexerName}/status?api-version=2024-07-01
api-key: <admin-key>Response:
{
"status": "unknown",
"lastResult": {
"status": "success",
"startTime": "2024-01-15T10:00:00Z",
"endTime": "2024-01-15T10:01:30Z",
"itemsProcessed": 150,
"itemsFailed": 2,
"errors": [],
"warnings": []
},
"executionHistory": [
{
"status": "success",
"startTime": "2024-01-15T10:00:00Z",
"endTime": "2024-01-15T10:01:30Z",
"itemsProcessed": 150,
"itemsFailed": 2
}
],
"limits": {
"maxRunTime": "PT2H",
"maxDocumentExtractionSize": 16777216,
"maxDocumentContentCharactersToExtract": 64000
}
}| Function | Description |
|---|---|
base64Encode |
Encodes string to URL-safe Base64 |
base64Decode |
Decodes Base64 to string |
urlEncode |
URL-encodes string |
urlDecode |
URL-decodes string |
extractTokenAtPosition |
Extracts token at position (params: delimiter, position) |
| Parameter | Type | Description |
|---|---|---|
batchSize |
int | Documents per batch (default: 1000) |
maxFailedItems |
int | Max failures before stopping (-1 = unlimited) |
maxFailedItemsPerBatch |
int | Max failures per batch |
configuration.parsingMode |
string | default, json, jsonLines, jsonArray, delimitedText |
configuration.dataToExtract |
string | contentAndMetadata, storageMetadata |
configuration.indexedFileNameExtensions |
string | Comma-separated extensions to include |
configuration.excludedFileNameExtensions |
string | Comma-separated extensions to exclude |
Returns service-level resource counters and limits.
GET /servicestats?api-version=2024-07-01
api-key: <admin-key>Response:
{
"@odata.context": "https://localhost:7250/$metadata#Microsoft.Azure.Search.V2024_07_01.ServiceStatistics",
"counters": {
"documentCount": { "usage": 153956, "quota": null },
"indexesCount": { "usage": 2, "quota": 15 },
"indexersCount": { "usage": 1, "quota": 15 },
"dataSourcesCount": { "usage": 1, "quota": 15 },
"storageSize": { "usage": 274215358, "quota": 16106127360 },
"synonymMaps": { "usage": 0, "quota": 3 },
"skillsetCount": { "usage": 0, "quota": 15 },
"vectorIndexSize": { "usage": 0, "quota": 5368709120 }
},
"limits": {
"maxStoragePerIndex": 16106127360,
"maxFieldsPerIndex": 1000,
"maxFieldNestingDepthPerIndex": 10,
"maxComplexCollectionFieldsPerIndex": 40,
"maxComplexObjectsInCollectionsPerDocument": 3000
}
}Counter Details:
| Counter | Usage | Quota |
|---|---|---|
documentCount |
Actual total across all indexes | null (unlimited, same as Azure) |
indexesCount |
Actual count | Hardcoded S1 default (15) |
indexersCount |
Actual count | Hardcoded S1 default (15) |
dataSourcesCount |
Actual count | Hardcoded S1 default (15) |
storageSize |
Actual Lucene index storage in bytes | Hardcoded S1 default (~15 GB) |
synonymMaps |
Actual count | Hardcoded S1 default (3) |
skillsetCount |
Actual count | Hardcoded S1 default (15) |
vectorIndexSize |
Actual HNSW index size in bytes | Hardcoded S1 default (5 GB) |
Note: The simulator does not enforce quotas. All
quotavalues andlimitsare hardcoded to Azure AI Search Standard (S1) tier defaults. Theusagevalues fordocumentCount,indexesCount,indexersCount,dataSourcesCount,storageSize,skillsetCount,synonymMaps, andvectorIndexSizereflect actual simulator state.
Administrative endpoints for token management and diagnostics.
Generates a simulated JWT token for local testing.
POST /admin/token?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"roles": ["Search Index Data Contributor", "Search Index Data Reader"],
"subject": "test-app",
"identityType": "app",
"expiresInMinutes": 60
}Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresAt": "2024-01-15T11:00:00Z",
"tokenType": "Bearer"
}Generates a token with a predefined role using shortcuts.
GET /admin/token/quick/{role}?api-version=2024-07-01
api-key: <admin-key>Available Role Shortcuts:
| Shortcut | Role |
|---|---|
owner |
Owner |
contributor |
Contributor |
reader |
Reader |
service-contributor |
Search Service Contributor |
data-contributor |
Search Index Data Contributor |
data-reader |
Search Index Data Reader |
Validates and inspects a JWT token.
POST /admin/token/validate?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}Response:
{
"isValid": true,
"claims": {
"sub": "test-app",
"roles": ["Search Index Data Contributor"],
"exp": 1705316400,
"iss": "https://simulator.local/"
},
"accessLevel": "IndexDataContributor"
}Returns current authentication configuration (non-sensitive).
GET /admin/token/info?api-version=2024-07-01
api-key: <admin-key>Tests the configured outbound credential settings.
GET /admin/diagnostics/credentials/test?api-version=2024-07-01
api-key: <admin-key>Returns authentication configuration status.
GET /admin/diagnostics/auth?api-version=2024-07-01
api-key: <admin-key>Acquires a token for an external Azure resource.
POST /admin/diagnostics/credentials/token?api-version=2024-07-01
Content-Type: application/json
api-key: <admin-key>Request Body:
{
"scope": "https://storage.azure.com/.default"
}All errors follow the OData error format:
{
"error": {
"code": "InvalidRequest",
"message": "The request is invalid.",
"details": [
{
"code": "FieldNotFound",
"message": "Field 'unknownField' is not defined in the index schema."
}
]
}
}| Code | HTTP Status | Description |
|---|---|---|
InvalidApiKey |
401 | Missing or invalid API key |
Forbidden |
403 | Key doesn't have permission |
IndexNotFound |
404 | Index doesn't exist |
DocumentNotFound |
404 | Document not found |
InvalidRequest |
400 | Malformed request |
ValidationError |
400 | Schema validation failed |
IndexerExecutionError |
500 | Indexer run failed |
The simulator implements soft rate limits for testing purposes:
| Limit | Value |
|---|---|
| Max requests/second | 100 |
| Max batch size | 1000 documents |
| Max query results | 1000 |
| Max facet values | 100 |
API Reference Version: 1.0