A complete reference for KnowFlow's RESTful API. This documentation is based on KnowFlow v2.1.5, which extends RAGFlow with enhanced features including MinerU layout parser and parent-child chunking strategies.
Version: KnowFlow v2.1.5 Last Updated: January 2025 Based on: RAGFlow v0.20.1
- Getting Started
- Error Codes
- OpenAI-Compatible API
- Dataset Management
- Document Management
- Chunk Management
- Chat Assistant Management
- Session Management
- Agent Management
- System APIs
All API requests require authentication using an API key in the Authorization header:
Authorization: Bearer <YOUR_API_KEY>To obtain your API key:
- Log in to KnowFlow web interface
- Navigate to Settings > API Key
- Copy your API key
http://<your-server>:9380
Default development URL: http://localhost:9380
Content-Type: application/json
Authorization: Bearer <YOUR_API_KEY>| Code | Message | Description |
|---|---|---|
| 0 | Success | Request successful |
| 102 | Invalid Parameter | Required parameter missing or invalid |
| 103 | Authorization Failed | Permission denied |
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Unauthorized access |
| 403 | Forbidden | Access denied |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server internal error |
| 1001 | Invalid Chunk ID | Invalid Chunk ID |
| 1002 | Chunk Update Failed | Chunk update failed |
POST /api/v1/chats_openai/{chat_id}/chat/completions
Creates a model response for a given chat conversation using OpenAI-compatible format.
- Method: POST
- URL:
/api/v1/chats_openai/{chat_id}/chat/completions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/chats_openai/{chat_id}/chat/completions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"model": "model",
"messages": [{"role": "user", "content": "What is machine learning?"}],
"stream": true
}'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
model(Body parameter)string, Required- The model to use (server will parse automatically)
-
messages(Body parameter)array<object>, Required- Chat message history
- Must contain at least one message with
userrole - Format:
[{"role": "user", "content": "text"}]
-
stream(Body parameter)boolean- Whether to stream the response
- Default:
false
Stream Response:
data:{"id": "chatcmpl-xxx", "choices": [{"delta": {"content": "Machine learning is...", "role": "assistant"}, "finish_reason": null, "index": 0}], "created": 1755084508, "model": "model", "object": "chat.completion.chunk"}
data:[DONE]Non-stream Response:
{
"choices": [{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Machine learning is a subset of artificial intelligence...",
"role": "assistant"
}
}],
"created": 1755084403,
"id": "chatcmpl-xxx",
"model": "model",
"object": "chat.completion",
"usage": {
"completion_tokens": 55,
"prompt_tokens": 5,
"total_tokens": 60
}
}POST /api/v1/agents_openai/{agent_id}/chat/completions
Creates a model response for a given agent conversation using OpenAI-compatible format.
- Method: POST
- URL:
/api/v1/agents_openai/{agent_id}/chat/completions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/agents_openai/{agent_id}/chat/completions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"model": "model",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true,
"session_id": "optional_session_id"
}'-
agent_id(Path parameter)string, Required- The agent ID
-
model(Body parameter)string, Required- The model to use
-
messages(Body parameter)array<object>, Required- Chat message history
-
stream(Body parameter)boolean- Whether to stream the response
-
session_id(Body parameter)string- Agent session ID (optional)
Similar to Chat Completion API, with additional reference field containing retrieved chunks.
POST /api/v1/datasets
Creates a new dataset (knowledge base) with specified configuration.
- Method: POST
- URL:
/api/v1/datasets - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/datasets \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "My Knowledge Base",
"description": "A test knowledge base",
"embedding_model": "BAAI/bge-m3@SILICONFLOW",
"chunk_method": "smart",
"parser_config": {
"layout_recognize": "mineru",
"chunk_token_num": 256
}
}'-
name(Body parameter)string, Required- The unique name of the dataset to create
- Maximum 128 characters
- Case-insensitive
-
description(Body parameter)string- A brief description of the dataset
- Maximum 65535 characters
-
avatar(Body parameter)string- Base64 encoding of the avatar
- Maximum 65535 characters
-
embedding_model(Body parameter)string- The embedding model identifier
- Format:
<model_name>@<provider> - Example:
"BAAI/bge-m3@SILICONFLOW" - Maximum 255 characters
- Important: Must include both model name and provider separated by
@
-
permission(Body parameter)enum<string>- Access control for the dataset
- Options:
"me": (Default) Only you can manage"team": All team members can manage
-
chunk_method(Body parameter)enum<string>- The chunking method for document parsing
- Available options:
"naive": General chunking (default)"smart": Smart chunking with structure awareness"book": Optimized for books"paper": Optimized for academic papers"presentation": Optimized for slides"qa": Question & Answer format"table": Table extraction"manual": Manual chunking"one": Single chunk per document"email": Email format"laws": Legal documents"picture": Image-focused"tag": Tag-based chunking
-
parser_config(Body parameter)object- Configuration for the document parser
- Attributes:
layout_recognizestring: Layout parser to use"deepdoc": DeepDOC parser (default)"mineru": MinerU parser (recommended for complex layouts)"dots": DOTS parser- Important: Must be a string, not boolean
chunk_token_numinteger: Target token count per chunk- Default: 512
- Range: 1-2048
delimiterstring: Delimiter for chunking- Default:
"\n"
- Default:
html4excelboolean: Convert Excel to HTML- Default:
false
- Default:
auto_keywordsinteger: Number of keywords to auto-generate- Default: 0
- Range: 0-32
auto_questionsinteger: Number of questions to auto-generate- Default: 0
- Range: 0-10
task_page_sizeinteger: Pages per processing task (PDF only)- Default: 12
Success (HTTP 200):
{
"code": 0,
"message": "success",
"data": {
"id": "4345aa0ea1a311f0b45566fc51ac58df",
"name": "My Knowledge Base",
"description": "A test knowledge base",
"embedding_model": "BAAI/bge-m3@SILICONFLOW",
"chunk_method": "smart",
"parser_config": {
"layout_recognize": "mineru",
"chunk_token_num": 256
},
"created_at": "2025-01-15T10:30:00Z",
"updated_at": "2025-01-15T10:30:00Z",
"tenant_id": "user123",
"status": "1",
"document_count": 0,
"chunk_count": 0
}
}Failure (HTTP 400):
{
"code": 102,
"message": "Embedding model identifier must follow <model_name>@<provider> format"
}GET /api/v1/datasets
Lists all datasets for the authenticated user with optional filtering and pagination.
- Method: GET
- URL:
/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/datasets?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
page(Query parameter)integer- Page number for pagination
- Default: 1
- Minimum: 1
-
page_size(Query parameter)integer- Number of items per page
- Default: 30
- Range: 1-100
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
id(Query parameter)string- Filter by specific dataset ID
- When provided, returns only that dataset
-
name(Query parameter)string- Filter by dataset name (partial match)
Success (HTTP 200):
{
"code": 0,
"data": {
"datasets": [
{
"id": "4345aa0ea1a311f0b45566fc51ac58df",
"name": "My Knowledge Base",
"description": "A test knowledge base",
"embedding_model": "BAAI/bge-m3@SILICONFLOW",
"chunk_method": "smart",
"parser_config": {
"layout_recognize": "mineru",
"chunk_token_num": 256
},
"created_at": "2025-01-15T10:30:00Z",
"document_count": 5,
"chunk_count": 245
}
],
"total": 1,
"page": 1,
"page_size": 10
}
}PUT /api/v1/datasets/{dataset_id}
Updates an existing dataset's properties.
- Method: PUT
- URL:
/api/v1/datasets/{dataset_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "Updated Knowledge Base",
"description": "Updated description"
}'-
dataset_id(Path parameter)string, Required- The dataset ID to update
-
name(Body parameter)string- New name for the dataset
- Maximum 128 characters
-
description(Body parameter)string- New description for the dataset
-
embedding_model(Body parameter)string- New embedding model
-
chunk_method(Body parameter)enum<string>- New chunking method
-
parser_config(Body parameter)object- New parser configuration
-
permission(Body parameter)enum<string>- New permission setting
Success (HTTP 200):
{
"code": 0,
"data": {
"id": "4345aa0ea1a311f0b45566fc51ac58df",
"name": "Updated Knowledge Base",
"description": "Updated description",
"updated_at": "2025-01-15T11:00:00Z"
}
}DELETE /api/v1/datasets
Deletes one or more datasets.
- Method: DELETE
- URL:
/api/v1/datasets - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/datasets \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["4345aa0ea1a311f0b45566fc51ac58df"]
}'ids(Body parameter)array<string>, Required- List of dataset IDs to delete
- Minimum: 1 ID
- If empty, all datasets will be deleted (use with caution!)
Success (HTTP 200):
{
"code": 0
}GET /api/v1/datasets/{dataset_id}/knowledge_graph
Retrieves the knowledge graph for a specified dataset.
- Method: GET
- URL:
/api/v1/datasets/{dataset_id}/knowledge_graph - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/knowledge_graph \
--header 'Authorization: Bearer <YOUR_API_KEY>'Success (HTTP 200):
{
"code": 0,
"data": {
"nodes": [
{"id": "entity1", "label": "Entity 1", "type": "concept"},
{"id": "entity2", "label": "Entity 2", "type": "concept"}
],
"edges": [
{"source": "entity1", "target": "entity2", "relation": "related_to"}
]
}
}DELETE /api/v1/datasets/{dataset_id}/knowledge_graph
Deletes the knowledge graph for a specified dataset.
- Method: DELETE
- URL:
/api/v1/datasets/{dataset_id}/knowledge_graph - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/knowledge_graph \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>'Success (HTTP 200):
{
"code": 0
}Important Note on Document Parsing Workflow
RAGFlow uses a 3-step workflow for document processing:
- Upload the document (
POST /api/v1/datasets/{id}/documents)- Trigger parsing explicitly (
POST /api/v1/datasets/{id}/chunks)- Monitor parsing progress (
GET /api/v1/datasets/{id}/documents?id={doc_id})Documents uploaded will have
"run": "UNSTART"status and will NOT be parsed automatically. You must explicitly call the parsing trigger endpoint to start processing.
POST /api/v1/datasets/{dataset_id}/documents
Uploads a document to a dataset. Note: This only uploads the file; parsing must be triggered separately.
- Method: POST
- URL:
/api/v1/datasets/{dataset_id}/documents - Headers:
Authorization: Bearer <YOUR_API_KEY>- Note: Do NOT set
Content-Typefor file uploads (multipart/form-data is set automatically)
curl --request POST \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--form 'file=@/path/to/document.pdf' \
--form 'parser_id=smart' \
--form 'parser_config={"chunk_token_num":256,"layout_recognize":"mineru"}'-
dataset_id(Path parameter)string, Required- The dataset ID to upload to
-
file(Form parameter)file, Required- The document file to upload
- Supported formats: PDF, DOCX, TXT, MD, HTML, XLSX, PPTX, PNG, JPG, etc.
- Maximum size: 1GB (configurable via MAX_CONTENT_LENGTH)
-
parser_id(Form parameter)string- Override the dataset's default chunk method for this document
- Same options as
chunk_methodin Create Dataset - Defaults to dataset's chunk_method
-
parser_config(Form parameter)string(JSON)- Override the dataset's parser config for this document
- Must be a JSON string
- Example:
'{"chunk_token_num":256,"layout_recognize":"mineru"}'
Success (HTTP 200):
{
"code": 0,
"message": "success",
"data": [
{
"id": "c6db195ea4b811f097ee66fc51ac58df",
"name": "document.pdf",
"size": 1024567,
"type": "application/pdf",
"parser_id": "smart",
"parser_config": {
"chunk_token_num": 256,
"layout_recognize": "mineru"
},
"status": "0",
"progress": 0,
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:00:00Z"
}
]
}Status Codes:
"0": Parsing (in progress)"1": Completed (parsing successful)"2": Failed (parsing error)"UNSTART": Uploaded but parsing not triggered
POST /api/v1/datasets/{dataset_id}/chunks
Triggers parsing for one or more uploaded documents. Important: Documents must be explicitly triggered for parsing after upload.
- Method: POST
- URL:
/api/v1/datasets/{dataset_id}/chunks - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/chunks \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"document_ids": ["c6db195ea4b811f097ee66fc51ac58df"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_ids(Body parameter)array<string>, Required- List of document IDs to trigger parsing for
- Documents must be already uploaded to the dataset
Success (HTTP 200):
{
"code": 0,
"data": {
"message": "Parsing triggered successfully"
}
}Failure (HTTP 400):
{
"code": 102,
"message": "Document not found or already parsing"
}Notes:
- This endpoint initiates asynchronous parsing
- Use the List Documents endpoint with ID filter to check parsing progress
- Parsing typically completes in 3-10 seconds for small documents
- Large documents may take longer depending on size and complexity
GET /api/v1/datasets/{dataset_id}/documents
Lists all documents in a dataset with optional filtering and pagination.
- Method: GET
- URL:
/api/v1/datasets/{dataset_id}/documents?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&keywords={keywords}&id={document_id}&name={document_name}&create_time_from={timestamp}&create_time_to={timestamp} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
dataset_id(Path parameter)string, Required- The dataset ID
-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 30
- Range: 1-100
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
keywords(Query parameter)string- Search keywords in document name
-
id(Query parameter)string- Filter by document ID
-
name(Query parameter)string- Filter by document name
-
create_time_from(Query parameter)integer- Filter by creation time (Unix timestamp)
-
create_time_to(Query parameter)integer- Filter by creation time (Unix timestamp)
Success (HTTP 200):
{
"code": 0,
"data": {
"documents": [
{
"id": "c6db195ea4b811f097ee66fc51ac58df",
"name": "document.pdf",
"size": 1024567,
"type": "application/pdf",
"parser_id": "smart",
"status": "1",
"progress": 100,
"chunk_count": 45,
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:05:00Z"
}
],
"total": 1,
"page": 1,
"page_size": 10
}
}GET /api/v1/datasets/{dataset_id}/documents/{document_id}
Downloads the original document file. Note: This endpoint returns the file content (binary), not JSON metadata.
- Method: GET
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--output document.pdfSuccess (HTTP 200):
Returns the raw file content (binary data) with appropriate Content-Type header:
- PDF files:
application/pdf - Text files:
text/plain - Word documents:
application/vnd.openxmlformats-officedocument.wordprocessingml.document - etc.
To get document metadata instead, use the List Documents endpoint with ID filter:
curl --request GET \
--url 'http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents?id=c6db195ea4b811f097ee66fc51ac58df' \
--header 'Authorization: Bearer <YOUR_API_KEY>'This returns JSON metadata including status, progress, chunk count, etc.
PUT /api/v1/datasets/{dataset_id}/documents/{document_id}
Updates document properties (name, parser settings).
- Method: PUT
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "updated_document.pdf",
"parser_id": "smart",
"parser_config": {
"chunk_token_num": 512,
"layout_recognize": "mineru"
}
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Path parameter)string, Required- The document ID
-
name(Body parameter)string- New document name
-
parser_id(Body parameter)string- New chunking method
-
parser_config(Body parameter)object- New parser configuration
Success (HTTP 200):
{
"code": 0
}DELETE /api/v1/datasets/{dataset_id}/documents
Deletes one or more documents from a dataset.
- Method: DELETE
- URL:
/api/v1/datasets/{dataset_id}/documents - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["c6db195ea4b811f097ee66fc51ac58df"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
ids(Body parameter)array<string>, Required- List of document IDs to delete
- If empty, all documents in the dataset will be deleted
Success (HTTP 200):
{
"code": 0
}POST /api/v1/datasets/{dataset_id}/chunks
Creates chunks at the dataset level (not tied to a specific document).
- Method: POST
- URL:
/api/v1/datasets/{dataset_id}/chunks - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/chunks \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"document_id": "c6db195ea4b811f097ee66fc51ac58df",
"content": "This is a chunk content",
"important_keywords": ["keyword1", "keyword2"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Body parameter)string, Required- The document ID this chunk belongs to
-
content(Body parameter)string, Required- The text content of the chunk
-
important_keywords(Body parameter)array<string>- Key terms or phrases to tag with the chunk
-
questions(Body parameter)array<string>- Questions that this chunk can answer
Success (HTTP 200):
{
"code": 0,
"data": {
"chunk_id": "8c204dcbb8955158"
}
}POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks
Adds a chunk to a specified document.
- Method: POST
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df/chunks \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"content": "This is a chunk content",
"important_keywords": ["keyword1", "keyword2"],
"questions": ["What is this about?"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Path parameter)string, Required- The document ID
-
content(Body parameter)string, Required- The text content of the chunk
-
important_keywords(Body parameter)array<string>- Key terms or phrases to tag with the chunk
-
questions(Body parameter)array<string>- Questions that this chunk can answer
Success (HTTP 200):
{
"code": 0,
"data": {
"chunk": {
"id": "8c204dcbb8955158",
"content": "This is a chunk content",
"important_keywords": ["keyword1", "keyword2"],
"questions": ["What is this about?"],
"create_time": "2025-01-15T12:10:00Z",
"dataset_id": "4345aa0ea1a311f0b45566fc51ac58df",
"document_id": "c6db195ea4b811f097ee66fc51ac58df"
}
}
}GET /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks
Lists all chunks of a document with optional filtering.
- Method: GET
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks?keywords={keywords}&page={page}&page_size={page_size}&id={id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df/chunks?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Path parameter)string, Required- The document ID
-
keywords(Query parameter)string- Filter chunks by keywords in content
-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 1024
- Range: 1-1024
-
id(Query parameter)string- Filter by specific chunk ID
Success (HTTP 200):
{
"code": 0,
"data": {
"chunks": [
{
"id": "8c204dcbb8955158",
"content": "This is the chunk content extracted from the document...",
"important_keywords": ["keyword1", "keyword2"],
"positions": [[1, 100, 200, 300, 400]],
"page_number": 1,
"available": true,
"doc_id": "c6db195ea4b811f097ee66fc51ac58df",
"dataset_id": "4345aa0ea1a311f0b45566fc51ac58df",
"created_at": "2025-01-15T12:05:00Z"
}
],
"doc": {
"id": "c6db195ea4b811f097ee66fc51ac58df",
"name": "document.pdf",
"chunk_count": 45,
"chunk_method": "smart",
"parser_config": {
"chunk_token_num": 256,
"layout_recognize": "mineru"
}
},
"total": 45,
"page": 1,
"page_size": 10
}
}Chunk Position Format:
- For MinerU parser:
[page_idx, x1, x2, y1, y2](72 DPI PDF coordinates) - For DOTS parser:
[x1, y1, x2, y2](200 DPI image coordinates)
PUT /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks/{chunk_id}
Updates content or configurations for a specified chunk.
- Method: PUT
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks/{chunk_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df/chunks/8c204dcbb8955158 \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"content": "Updated chunk content",
"important_keywords": ["new_keyword"],
"available": true
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Path parameter)string, Required- The document ID
-
chunk_id(Path parameter)string, Required- The chunk ID to update
-
content(Body parameter)string- New text content of the chunk
-
important_keywords(Body parameter)array<string>- New list of key terms or phrases
-
available(Body parameter)boolean- The chunk's availability status in the dataset
true: Available (default)false: Unavailable (excluded from retrieval)
Success (HTTP 200):
{
"code": 0
}DELETE /api/v1/datasets/{dataset_id}/chunks
Deletes chunks at the dataset level.
- Method: DELETE
- URL:
/api/v1/datasets/{dataset_id}/chunks - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/chunks \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"chunk_ids": ["8c204dcbb8955158", "9d305eccc9066269"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
chunk_ids(Body parameter)array<string>- List of chunk IDs to delete
- If empty, all chunks in the dataset will be deleted
Success (HTTP 200):
{
"code": 0
}DELETE /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks
Deletes chunks from a specified document.
- Method: DELETE
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/datasets/4345aa0ea1a311f0b45566fc51ac58df/documents/c6db195ea4b811f097ee66fc51ac58df/chunks \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"chunk_ids": ["8c204dcbb8955158"]
}'-
dataset_id(Path parameter)string, Required- The dataset ID
-
document_id(Path parameter)string, Required- The document ID
-
chunk_ids(Body parameter)array<string>- List of chunk IDs to delete
- If empty, all chunks of the specified document will be deleted
Success (HTTP 200):
{
"code": 0
}POST /api/v1/retrieval
Performs semantic search across one or more datasets to retrieve relevant chunks.
- Method: POST
- URL:
/api/v1/retrieval - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/retrieval \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"question": "What is machine learning?",
"dataset_ids": ["4345aa0ea1a311f0b45566fc51ac58df"],
"page": 1,
"page_size": 5,
"similarity_threshold": 0.2,
"vector_similarity_weight": 0.3,
"keyword": true,
"highlight": true
}'-
question(Body parameter)string, Required- The search query text
- Minimum 1 character
-
dataset_ids(Body parameter)array<string>- List of dataset IDs to search in
- Either
dataset_idsordocument_idsmust be provided
-
document_ids(Body parameter)array<string>- List of document IDs to search in
- Either
dataset_idsordocument_idsmust be provided - All documents must use the same embedding model
-
page(Body parameter)integer- Page number for results
- Default: 1
-
page_size(Body parameter)integer- Number of chunks to return per page
- Default: 30
- Range: 1-100
-
similarity_threshold(Body parameter)float- Minimum similarity score (0.0-1.0)
- Default: 0.2
- Chunks below this threshold are filtered out
-
vector_similarity_weight(Body parameter)float- Weight for vector similarity vs. keyword matching
- Default: 0.3
- Range: 0.0-1.0
- Higher value = more weight on semantic similarity
- If x is vector weight, then (1-x) is keyword weight
-
top_k(Body parameter)integer- Maximum number of chunks to retrieve before reranking
- Default: 1024
-
rerank_id(Body parameter)string- ID of the rerank model to use
- If not specified, vector cosine similarity will be used
-
keyword(Body parameter)boolean- Enable keyword-based matching
- Default:
false
-
highlight(Body parameter)boolean- Enable highlighting of matched terms in results
- Default:
false
-
cross_languages(Body parameter)array<string>- Languages to translate query into for cross-language retrieval
- Example:
["en", "zh", "ja"]
-
metadata_condition(Body parameter)object- Metadata filtering conditions
- Example:
{"author": "John", "year": 2024}
Success (HTTP 200):
{
"code": 0,
"data": {
"chunks": [
{
"id": "8c204dcbb8955158",
"content": "Machine learning is a subset of artificial intelligence...",
"document_id": "c6db195ea4b811f097ee66fc51ac58df",
"document_name": "document.pdf",
"dataset_id": "4345aa0ea1a311f0b45566fc51ac58df",
"positions": [[1, 100, 200, 300, 400]],
"page_number": 1,
"similarity": 0.856,
"vector_similarity": 0.892,
"term_similarity": 0.745,
"important_keywords": ["machine learning", "artificial intelligence"],
"image_id": ""
}
],
"total": 45,
"page": 1,
"page_size": 5
}
}Similarity Scores:
similarity: Overall combined scorevector_similarity: Semantic embedding similarity (0.0-1.0)term_similarity: Keyword/BM25 similarity (0.0-1.0)
POST /api/v1/chats
Creates a chat assistant with specified configuration.
- Method: POST
- URL:
/api/v1/chats - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/chats \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "My Assistant",
"dataset_ids": ["4345aa0ea1a311f0b45566fc51ac58df"],
"llm": {
"model_name": "qwen-plus@Tongyi-Qianwen",
"temperature": 0.1,
"top_p": 0.3
},
"prompt": {
"similarity_threshold": 0.2,
"top_n": 6,
"opener": "Hi! I am your assistant. How can I help you?"
}
}'-
name(Body parameter)string, Required- The name of the chat assistant
- Must be unique
-
avatar(Body parameter)string- Base64 encoding of the avatar
-
dataset_ids(Body parameter)array<string>- The IDs of the associated datasets (knowledge bases)
-
llm(Body parameter)object- LLM settings for the chat assistant
- Attributes:
model_namestring: The chat model name- If not set, user's default chat model will be used
temperaturefloat: Randomness of predictions- Default: 0.1
- Range: 0.0-1.0
top_pfloat: Nucleus sampling threshold- Default: 0.3
- Range: 0.0-1.0
presence_penaltyfloat: Penalty for repeating information- Default: 0.4
- Range: 0.0-2.0
frequency_penaltyfloat: Penalty for repeating words- Default: 0.7
- Range: 0.0-2.0
-
prompt(Body parameter)object- Instructions for the LLM
- Attributes:
similarity_thresholdfloat: Minimum similarity score- Default: 0.2
- Range: 0.0-1.0
keywords_similarity_weightfloat: Weight of keyword similarity- Default: 0.7
- Range: 0.0-1.0
top_ninteger: Number of top chunks to feed to LLM- Default: 6
variablesarray<object>: Variables for system prompt- Default:
[{"key": "knowledge", "optional": true}] knowledgeis reserved for retrieved chunks
- Default:
rerank_modelstring: ID of rerank model to usetop_kinteger: Top-k for reranking- Default: 1024
empty_responsestring: Response when nothing is retrievedopenerstring: Opening greeting- Default:
"Hi! I am your assistant, can I help you?"
- Default:
show_quoteboolean: Show source of text- Default:
true
- Default:
promptstring: The actual prompt content
Success (HTTP 200):
{
"code": 0,
"data": {
"id": "b1f2f15691f911ef81180242ac120003",
"name": "My Assistant",
"avatar": "",
"dataset_ids": ["4345aa0ea1a311f0b45566fc51ac58df"],
"description": "A helpful Assistant",
"language": "English",
"llm": {
"model_name": "qwen-plus@Tongyi-Qianwen",
"temperature": 0.1,
"top_p": 0.3,
"presence_penalty": 0.4,
"frequency_penalty": 0.7
},
"prompt": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"opener": "Hi! I am your assistant. How can I help you?",
"show_quote": true,
"empty_response": "Sorry! No relevant content was found in the knowledge base!",
"variables": [
{"key": "knowledge", "optional": false}
]
},
"status": "1",
"create_time": "2025-01-15T14:00:00Z",
"update_time": "2025-01-15T14:00:00Z"
}
}Failure (HTTP 400):
{
"code": 102,
"message": "Duplicated chat name in creating dataset."
}PUT /api/v1/chats/{chat_id}
Updates configurations for a specified chat assistant.
- Method: PUT
- URL:
/api/v1/chats/{chat_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003 \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "Updated Assistant",
"llm": {
"temperature": 0.5
}
}'-
chat_id(Path parameter)string, Required- The ID of the chat assistant to update
-
All other parameters are the same as Create Chat Assistant
-
Only specified parameters will be updated
Success (HTTP 200):
{
"code": 0
}DELETE /api/v1/chats
Deletes chat assistants by ID.
- Method: DELETE
- URL:
/api/v1/chats - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/chats \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["b1f2f15691f911ef81180242ac120003"]
}'ids(Body parameter)array<string>, Required- List of chat assistant IDs to delete
- If empty, all chat assistants will be deleted
Success (HTTP 200):
{
"code": 0
}GET /api/v1/chats
Lists chat assistants with optional filtering and pagination.
- Method: GET
- URL:
/api/v1/chats?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={chat_name}&id={chat_id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/chats?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 30
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
id(Query parameter)string- Filter by chat assistant ID
-
name(Query parameter)string- Filter by chat assistant name
Success (HTTP 200):
{
"code": 0,
"data": [
{
"id": "b1f2f15691f911ef81180242ac120003",
"name": "My Assistant",
"avatar": "",
"dataset_ids": ["4345aa0ea1a311f0b45566fc51ac58df"],
"description": "A helpful Assistant",
"language": "English",
"llm": {
"model_name": "qwen-plus@Tongyi-Qianwen",
"temperature": 0.1,
"top_p": 0.3,
"presence_penalty": 0.4,
"frequency_penalty": 0.7
},
"status": "1",
"create_time": "2025-01-15T14:00:00Z",
"update_time": "2025-01-15T14:00:00Z"
}
]
}POST /api/v1/chats/{chat_id}/completions
Sends a message to a chat assistant and receives a response.
- Method: POST
- URL:
/api/v1/chats/{chat_id}/completions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003/completions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"question": "What is machine learning?",
"session_id": "optional_session_id",
"stream": true
}'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
question(Body parameter)string, Required- The user's question
-
session_id(Body parameter)string- Session ID to maintain conversation context
- If not provided, a new session will be created
-
stream(Body parameter)boolean- Whether to stream the response
- Default:
false
Stream Response:
data:{"answer": "Machine learning is...", "reference": {...}}
data:{"answer": "a subset of artificial intelligence...", "reference": null}
data:[DONE]
Non-stream Response:
{
"code": 0,
"data": {
"answer": "Machine learning is a subset of artificial intelligence...",
"reference": {
"chunks": [...],
"doc_aggs": {...}
}
}
}POST /api/v1/chats/{chat_id}/sessions
Creates a new session for a chat assistant.
- Method: POST
- URL:
/api/v1/chats/{chat_id}/sessions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003/sessions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "My Chat Session"
}'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
name(Body parameter)string- Session name
- If not provided, a default name will be generated
Success (HTTP 200):
{
"code": 0,
"data": {
"id": "session123",
"name": "My Chat Session",
"chat_id": "b1f2f15691f911ef81180242ac120003",
"create_time": "2025-01-15T15:00:00Z",
"update_time": "2025-01-15T15:00:00Z"
}
}PUT /api/v1/chats/{chat_id}/sessions/{session_id}
Updates a session's properties.
- Method: PUT
- URL:
/api/v1/chats/{chat_id}/sessions/{session_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003/sessions/session123 \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "Updated Session Name"
}'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
session_id(Path parameter)string, Required- The session ID to update
-
name(Body parameter)string- New session name
Success (HTTP 200):
{
"code": 0
}GET /api/v1/chats/{chat_id}/sessions
Lists sessions for a chat assistant.
- Method: GET
- URL:
/api/v1/chats/{chat_id}/sessions?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={session_name}&id={session_id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003/sessions?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 30
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
id(Query parameter)string- Filter by session ID
-
name(Query parameter)string- Filter by session name
Success (HTTP 200):
{
"code": 0,
"data": [
{
"id": "session123",
"name": "My Chat Session",
"chat_id": "b1f2f15691f911ef81180242ac120003",
"message_count": 15,
"create_time": "2025-01-15T15:00:00Z",
"update_time": "2025-01-15T16:30:00Z"
}
]
}DELETE /api/v1/chats/{chat_id}/sessions
Deletes one or more sessions.
- Method: DELETE
- URL:
/api/v1/chats/{chat_id}/sessions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/chats/b1f2f15691f911ef81180242ac120003/sessions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["session123"]
}'-
chat_id(Path parameter)string, Required- The chat assistant ID
-
ids(Body parameter)array<string>, Required- List of session IDs to delete
- If empty, all sessions for the chat assistant will be deleted
Success (HTTP 200):
{
"code": 0
}POST /api/v1/sessions/related_questions
Retrieves related questions based on the current conversation context.
- Method: POST
- URL:
/api/v1/sessions/related_questions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/sessions/related_questions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"session_id": "session123",
"question": "What is machine learning?"
}'-
session_id(Body parameter)string, Required- The session ID
-
question(Body parameter)string, Required- The current question
Success (HTTP 200):
{
"code": 0,
"data": {
"related_questions": [
"What are the types of machine learning?",
"How does machine learning differ from deep learning?",
"What are common machine learning algorithms?"
]
}
}GET /api/v1/agents
Lists agents with optional filtering and pagination.
- Method: GET
- URL:
/api/v1/agents?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={agent_name}&id={agent_id} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/agents?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 30
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
id(Query parameter)string- Filter by agent ID
-
title(Query parameter)string- Filter by agent title/name
Success (HTTP 200):
{
"code": 0,
"data": [
{
"id": "8d9ca0e2b2f911ef9ca20242ac120006",
"title": "My Agent",
"description": "A helpful agent",
"avatar": null,
"canvas_type": null,
"dsl": {
"components": {...},
"graph": {...}
},
"create_time": "2025-01-15T16:00:00Z",
"update_time": "2025-01-15T16:00:00Z",
"user_id": "user123"
}
]
}POST /api/v1/agents
Creates a new agent with specified Canvas DSL configuration.
- Method: POST
- URL:
/api/v1/agents - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/agents \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"title": "My Agent",
"description": "A helpful agent",
"dsl": {
"components": {
"begin": {
"obj": {
"component_name": "Begin",
"params": {}
},
"downstream": [],
"upstream": []
}
},
"graph": {
"nodes": [...],
"edges": []
}
}
}'-
title(Body parameter)string, Required- The title of the agent
- Must be unique
-
description(Body parameter)string- Description of the agent
-
dsl(Body parameter)object, Required- The Canvas DSL object defining the agent's workflow
- Contains components, graph, and configuration
Success (HTTP 200):
{
"code": 0,
"data": true,
"message": "success"
}Failure (HTTP 400):
{
"code": 102,
"message": "Agent with title 'My Agent' already exists."
}PUT /api/v1/agents/{agent_id}
Updates an existing agent by ID.
- Method: PUT
- URL:
/api/v1/agents/{agent_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request PUT \
--url http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006 \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"title": "Updated Agent",
"description": "Updated description"
}'-
agent_id(Path parameter)string, Required- The agent ID to update
-
title(Body parameter)string- New title for the agent
-
description(Body parameter)string- New description
-
dsl(Body parameter)object- New Canvas DSL configuration
Note: Only specify parameters you want to update. Unspecified parameters won't be changed.
Success (HTTP 200):
{
"code": 0,
"data": true,
"message": "success"
}Failure (HTTP 403):
{
"code": 103,
"message": "Only owner of canvas authorized for this operation."
}DELETE /api/v1/agents/{agent_id}
Deletes an agent by ID.
- Method: DELETE
- URL:
/api/v1/agents/{agent_id} - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006 \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>'agent_id(Path parameter)string, Required- The agent ID to delete
Success (HTTP 200):
{
"code": 0,
"data": true,
"message": "success"
}POST /api/v1/agents/{agent_id}/sessions
Creates a new session for an agent.
- Method: POST
- URL:
/api/v1/agents/{agent_id}/sessions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006/sessions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{}'Success (HTTP 200):
{
"code": 0,
"data": {
"id": "agent_session123",
"agent_id": "8d9ca0e2b2f911ef9ca20242ac120006",
"create_time": "2025-01-15T17:00:00Z"
}
}POST /api/v1/agents/{agent_id}/completions
Sends a message to an agent and receives a response.
- Method: POST
- URL:
/api/v1/agents/{agent_id}/completions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request POST \
--url http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006/completions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"question": "Hello, agent!",
"session_id": "agent_session123",
"stream": false
}'-
agent_id(Path parameter)string, Required- The agent ID
-
question(Body parameter)string, Required- The user's question
-
session_id(Body parameter)string, Required- The agent session ID
-
stream(Body parameter)boolean- Whether to stream the response
- Default:
false
Success (HTTP 200):
{
"code": 0,
"data": {
"answer": "Hello! How can I assist you today?",
"reference": {...}
}
}GET /api/v1/agents/{agent_id}/sessions
Lists sessions for an agent.
- Method: GET
- URL:
/api/v1/agents/{agent_id}/sessions?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&id={session_id}&user_id={user_id}&dsl={dsl} - Headers:
Authorization: Bearer <YOUR_API_KEY>
curl --request GET \
--url 'http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006/sessions?page=1&page_size=10' \
--header 'Authorization: Bearer <YOUR_API_KEY>'-
agent_id(Path parameter)string, Required- The agent ID
-
page(Query parameter)integer- Page number
- Default: 1
-
page_size(Query parameter)integer- Items per page
- Default: 30
-
orderby(Query parameter)string- Sort by attribute
- Options:
create_time(default),update_time
-
desc(Query parameter)boolean- Sort in descending order
- Default:
true
-
id(Query parameter)string- Filter by session ID
-
user_id(Query parameter)string- Filter by user ID
-
dsl(Query parameter)string- Filter by DSL configuration
Success (HTTP 200):
{
"code": 0,
"data": [
{
"id": "agent_session123",
"agent_id": "8d9ca0e2b2f911ef9ca20242ac120006",
"user_id": "user123",
"message_count": 10,
"create_time": "2025-01-15T17:00:00Z",
"update_time": "2025-01-15T17:30:00Z"
}
]
}DELETE /api/v1/agents/{agent_id}/sessions
Deletes one or more agent sessions.
- Method: DELETE
- URL:
/api/v1/agents/{agent_id}/sessions - Headers:
Content-Type: application/jsonAuthorization: Bearer <YOUR_API_KEY>
curl --request DELETE \
--url http://localhost:9380/api/v1/agents/8d9ca0e2b2f911ef9ca20242ac120006/sessions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["agent_session123"]
}'-
agent_id(Path parameter)string, Required- The agent ID
-
ids(Body parameter)array<string>, Required- List of session IDs to delete
- If empty, all sessions for the agent will be deleted
Success (HTTP 200):
{
"code": 0
}GET /v1/system/healthz
Checks the health status of the KnowFlow system.
- Method: GET
- URL:
/v1/system/healthz
curl --request GET \
--url http://localhost:9380/v1/system/healthzNote: This endpoint does not require authentication.
Success (HTTP 200):
{
"status": "healthy",
"version": "v2.1.5",
"timestamp": "2025-01-15T18:00:00Z"
}Failure (HTTP 503):
{
"status": "unhealthy",
"error": "Database connection failed"
}KnowFlow integrates MinerU, a powerful PDF parsing engine optimized for complex layouts, tables, and multi-column documents.
Benefits:
- High-accuracy OCR and layout recognition
- Preserves document structure (headings, paragraphs, tables)
- Extracts precise bounding box coordinates for each chunk
- Supports parent-child chunking for better context preservation
Usage:
{
"parser_config": {
"layout_recognize": "mineru",
"chunk_token_num": 256
}
}For documents parsed with MinerU or DOTS, KnowFlow supports a two-tier chunking strategy:
- Child Chunks: Small, granular chunks (256 tokens) used for semantic search
- Parent Chunks: Larger contextual chunks that contain multiple child chunks
How it works:
- Documents are parsed and chunked into small child chunks
- Child chunks are grouped into parent chunks based on document structure
- During retrieval, child chunks are searched first
- Parent chunks are returned to provide broader context
Benefits:
- More precise semantic matching (via small child chunks)
- Richer context for LLM generation (via parent chunks)
- Better handling of cross-chunk references
Configuration: Parent-child chunking is automatically enabled when using MinerU or DOTS parsers with smart chunking method.
- MinerU (
"mineru"): Best for complex PDFs with tables, multi-column layouts, academic papers - DOTS (
"dots"): Fast parser with good accuracy - DeepDOC (
"deepdoc"): Default parser, good for general documents
- Smart (
"smart"): Recommended for most use cases, structure-aware chunking - Paper (
"paper"): For academic papers with abstract, sections, references - Book (
"book"): For books with chapters and sections - General (
"naive"): Simple token-based chunking
- Small chunks (128-256 tokens): Better for precise retrieval, more chunks to search
- Medium chunks (256-512 tokens): Balanced approach (recommended)
- Large chunks (512-1024 tokens): More context per chunk, fewer total chunks
Choose an embedding model based on your language and use case:
- Chinese + English:
BAAI/bge-m3@SILICONFLOW - English only:
BAAI/bge-large-en-v1.5@BAAI - Multilingual:
BAAI/bge-m3@SILICONFLOW
- Start with default similarity threshold (0.2) and adjust based on results
- Increase
vector_similarity_weight(0.5-0.7) for more semantic matching - Decrease it (0.1-0.3) for more keyword-based matching
- Use
top_kto control the search space (higher = more comprehensive but slower)
import requests
import json
import time
BASE_URL = "http://localhost:9380"
API_KEY = "ragflow-NThkYWEwMTkzODM2NDYwN2ExY2I2MzFh"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}
# 1. Create a dataset
dataset_data = {
"name": "My Knowledge Base",
"description": "Technical documentation",
"embedding_model": "BAAI/bge-m3@SILICONFLOW",
"chunk_method": "smart",
"parser_config": {
"layout_recognize": "mineru",
"chunk_token_num": 256
}
}
response = requests.post(
f"{BASE_URL}/api/v1/datasets",
headers=headers,
json=dataset_data
)
dataset_id = response.json()["data"]["id"]
print(f"Created dataset: {dataset_id}")
# 2. Upload a document
with open("document.pdf", "rb") as f:
files = {"file": ("document.pdf", f, "application/pdf")}
form_data = {
"parser_id": "smart",
"parser_config": json.dumps({
"chunk_token_num": 256,
"layout_recognize": "mineru"
})
}
headers_upload = {"Authorization": f"Bearer {API_KEY}"}
response = requests.post(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/documents",
headers=headers_upload,
data=form_data,
files=files
)
document_id = response.json()["data"][0]["id"]
print(f"Uploaded document: {document_id}")
# 3. Trigger document parsing
trigger_response = requests.post(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/chunks",
headers=headers,
json={"document_ids": [document_id]}
)
print(f"Parsing triggered: {trigger_response.json()}")
# 4. Monitor parsing progress
max_wait = 120 # Maximum wait time in seconds
elapsed = 0
while elapsed < max_wait:
time.sleep(3)
elapsed += 3
# Use list endpoint with ID filter to check status
response = requests.get(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/documents?id={document_id}",
headers=headers
)
data = response.json()["data"]
if data.get("docs") and len(data["docs"]) > 0:
doc = data["docs"][0]
status = doc.get("status")
progress = doc.get("progress", 0)
if status == "1":
print(f"Document parsing completed in {elapsed} seconds!")
break
elif status == "2":
print("Document parsing failed")
break
else:
print(f"Parsing in progress... {progress}% (waited {elapsed}s)")
if elapsed >= max_wait:
print("Parsing timeout after 120 seconds")
# 4. Retrieve relevant chunks
retrieval_data = {
"question": "What is the main topic of this document?",
"dataset_ids": [dataset_id],
"page": 1,
"page_size": 5
}
response = requests.post(
f"{BASE_URL}/api/v1/retrieval",
headers=headers,
json=retrieval_data
)
chunks = response.json()["data"]["chunks"]
for chunk in chunks:
print(f"Chunk: {chunk['content'][:100]}...")
print(f"Similarity: {chunk['similarity']}")
# 5. Create a chat assistant
chat_data = {
"name": "My Assistant",
"dataset_ids": [dataset_id],
"llm": {
"model_name": "qwen-plus@Tongyi-Qianwen",
"temperature": 0.1
},
"prompt": {
"similarity_threshold": 0.2,
"top_n": 6
}
}
response = requests.post(
f"{BASE_URL}/api/v1/chats",
headers=headers,
json=chat_data
)
chat_id = response.json()["data"]["id"]
print(f"Created chat assistant: {chat_id}")
# 6. Chat with the assistant
chat_request = {
"question": "Summarize the document for me",
"stream": False
}
response = requests.post(
f"{BASE_URL}/api/v1/chats/{chat_id}/completions",
headers=headers,
json=chat_request
)
answer = response.json()["data"]["answer"]
print(f"Assistant response: {answer}")Solution: Ensure embedding_model includes both model name and provider:
{
"embedding_model": "BAAI/bge-m3@SILICONFLOW" // Correct
// NOT: "BAAI/bge-m3" // Wrong
}Solution: Use string value, not boolean:
{
"parser_config": {
"layout_recognize": "mineru" // Correct
// NOT: "layout_recognize": true // Wrong
}
}Solution: Use dataset_ids (not kb_id) for SDK API:
{
"dataset_ids": ["4345aa0ea1a311f0b45566fc51ac58df"] // Correct
// NOT: "kb_id": ["..."] // Wrong for SDK API
}Causes:
- Parsing was never triggered (status "UNSTART")
- MinerU service not running (stuck at "0")
- Document format not supported
- File corrupted
Solution:
For "UNSTART" status:
- You must explicitly trigger parsing:
POST /api/v1/datasets/{id}/chunks - Documents do NOT auto-parse after upload
For stuck at "0" status:
- Check MinerU service status:
docker ps | grep mineru - Verify document format is supported
- Try with a different document
Example workflow:
# 1. Upload
response = requests.post(f"{BASE_URL}/api/v1/datasets/{dataset_id}/documents", ...)
doc_id = response.json()["data"][0]["id"]
# 2. Trigger parsing (REQUIRED!)
requests.post(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/chunks",
json={"document_ids": [doc_id]}
)
# 3. Check status
response = requests.get(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/documents?id={doc_id}"
)Cause:
- Trying to create a chat assistant before document parsing is complete
- No documents in the dataset have finished parsing (status "1")
Solution:
- Wait for document parsing to complete before creating chat assistant
- Check document status:
GET /api/v1/datasets/{id}/documents?id={doc_id} - Ensure at least one document has
"status": "1"(completed)
# Wait for parsing to complete
while True:
response = requests.get(
f"{BASE_URL}/api/v1/datasets/{dataset_id}/documents?id={doc_id}",
headers=headers
)
docs = response.json()["data"].get("docs", [])
if docs and docs[0].get("status") == "1":
break
time.sleep(3)
# Now safe to create chat assistant
requests.post(f"{BASE_URL}/api/v1/chats", json=chat_data)Causes:
- Invalid API key
- API key expired
- Missing Authorization header
Solution:
- Verify API key is correct
- Check Authorization header format:
Bearer <API_KEY> - Regenerate API key if needed
-
Dataset Creation:
- Added support for
"mineru"and"dots"layout parsers embedding_modelvalidation now enforces@providersuffix- New
"smart"chunk method available
- Added support for
-
Document Upload & Parsing (BREAKING CHANGE):
parser_configmust be JSON string in form data (not object)- Documents no longer auto-parse after upload
- New required step: Must explicitly trigger parsing via
POST /api/v1/datasets/{id}/chunks - New status code:
"UNSTART"indicates document uploaded but not yet triggered - Enhanced status codes:
"0"(parsing),"1"(completed),"2"(failed)
-
Document Endpoints:
GET /documents/{id}now returns file content (binary), not JSON metadata- To get metadata, use
GET /documents?id={id}(list endpoint with filter) - New workflow: Upload → Trigger → Monitor (3 steps required)
-
Retrieval API:
- SDK version uses
dataset_idsparameter (legacy useskb_id) - Added parent-child chunk support (automatic for MinerU/DOTS)
- Enhanced similarity scoring with
vector_similarityandterm_similarity
- SDK version uses
- MinerU Integration: High-accuracy PDF parsing with structure preservation
- Parent-Child Chunking: Two-tier chunking strategy for better context
- Coordinate Mapping: Precise bounding boxes for chunk highlighting
- Dev Mode Logging: Debug output for parent-child relationships (
dev_mode=true)
This documentation covers 37 API endpoints across all major categories:
- OpenAI-Compatible API: 2 endpoints
- Dataset Management: 6 endpoints
- Document Management: 6 endpoints (including trigger parsing & download)
- Chunk Management: 6 endpoints
- Chat Assistant Management: 5 endpoints
- Session Management: 5 endpoints
- Agent Management: 6 endpoints
- System APIs: 1 endpoint
Testing Results (from comprehensive API validation):
- Total APIs tested: 37
- Success rate: 96.8% (30/31 functional endpoints)
- Skipped: 6 (OpenAI-compatible and some agent endpoints)
For issues or questions:
- GitHub Issues: https://github.com/your-repo/knowflow/issues
- Documentation: https://docs.knowflow.ai
- Email: support@knowflow.ai
Version: KnowFlow v2.1.5 Last Updated: January 2025 Based on: RAGFlow v0.20.1 Total APIs Documented: 37 Test Success Rate: 96.8%