rerank model  can't use

### System Info / 系統信息

windows 11
Xinference v2.2.0
xinference-local --host 10.16.13.11 --port 9997

### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [ ] docker / docker
- [x] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

Error: Error invoking remote method 'knowledge-base:rerank': Error: Rerank request failed: HTTP 500: Internal Server Error
Request details: {
  "url": "http://10.16.13.11:9997/v1/rerank",
  "message": "HTTP 500: Internal Server Error",
  "status": 500,
  "statusText": "Internal Server Error",
  "responseBody": {
    "detail": "[address=10.16.13.11:53707, pid=1964] 'results'"
  },
  "requestBody": {
    "model": "Qwen3-Reranker-8B",
    "query": "枸橼酸他莫昔芬片\n",
    "documents": [
      "1 / 2 一致性评价企业研究报告及生物等效性试验数据 信息公开 1. 基本情况汇总 表 通用名 枸橼酸他莫昔芬片 英文名 Tamoxifen Citrate Tablets 剂型及规格 片剂 ； 10mg( 按他莫昔芬计 ) 生产企业名称 上海复旦复华药业有限公司 生产企业地址 上海市闵行区曙光路 1399 号 上市许可持有人 上海复旦复华药业有限公司 最新批准文号 国药准字 H31021545 其它上市国家及上市时间 不适用 附加申请 ■有工艺变更 □无工艺变更 □其它 BE 供试样品批号 9230801 检验机构 上海复旦复华药业有限公司 检验结果 符合要求 完成的临床研究内容 ■ PK 终点生物等效性研究 □ PD 终点生物等效性研究 □临床研究 □其它 BE 备案号 / 临床试验批件号 B202300161 - 01 临床研究机构 东莞康华医院I期临床试验研究中心 数据统计分析机构 南京英锋医药科技有限公司 生物样本检测机构 南京科利泰医药科技有限公司 试验设计 单中心、随机、开放、单剂量、两周期、两序 列、自身交叉 空腹 和餐后试验 检测物质 他莫昔芬 检测方法 HPLC - MS/MS 法 临床 研究 豁免 情况 不适用 2 / 2 2. 生物等效性研究结果 空腹 BE n=30 药代动力学参数 （单位） 几何均值及比值 90% 置信区间 (%) 受试制剂（ T ） 参比制剂（ R ） T/R (%) C max (ng/mL) 19.61 20.06 97.74 93.07 - 102.64 AUC 0 - 72h (h*ng/mL) 482.09 475.69 101.35 98.30 - 104.49 餐后 BE n=30 药代动力学参数 （单位） 几何均值及比值 90% 置信区间 (%) 受试制剂（ T ） 参比制剂（ R ） T/R (%) C max (ng/mL) 24.09 24.21 99.51 95.23 - 103.98 AUC 0 - 72h (h*ng/mL) 600.55 600.90 99.94 97.83 - 102.10 3. 审评结论 建议 批准 。"
    ],
    "top_n": 6
  }
}

### The command used to start Xinference / 用以启动 xinference 的命令

lot get_availabl: id  7 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  7 | task 0 | processing task, is_child = 0
slot update_slots: id  7 | task 0 | new prompt, n_ctx_slot = 5120, n_keep = 0, task.n_tokens = 699
srv    send_error: task id = 0, error: input (699 tokens) is too large to process. increase the physical batch size (current batch size: 512)
slot      release: id  7 | task 0 | stop processing: n_tokens = 0, truncated = 0  | 673M/4.90G [24:35<2:33:00, 497kB/s]
srv          stop: cancel task, id_task = 0
srv  update_slots: no tokens to decode
srv  update_slots: all slots are idle
2026-03-10 18:56:52,846 xinference.core.model 1964 ERROR    [request d8b75b9c-1c6f-11f1-89f8-b5fed4915bdf] Leave rerank, error: 'results', elapsed time: 0 s
Traceback (most recent call last):
  File "D:\Python\Python312\Lib\site-packages\xinference\core\utils.py", line 95, in wrapped
    ret = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 775, in rerank
    return await self._call_wrapper_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 569, in _call_wrapper_json
    return await self._call_wrapper("json", fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 140, in _async_wrapper
    return await fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 594, in _call_wrapper
    ret = await asyncio.to_thread(fn, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\asyncio\threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\model\rerank\llama_cpp\core.py", line 208, in rerank
    result["results"] = result["results"][:top_n]
                        ~~~~~~^^^^^^^^^^^
KeyError: 'results'
2026-03-10 18:56:52,880 xinference.api.restful_api 4932 ERROR    [address=10.16.13.11:53707, pid=1964] 'results'
Traceback (most recent call last):
  File "D:\Python\Python312\Lib\site-packages\xinference\api\restful_api.py", line 1093, in rerank
    scores = await model.rerank(
             ^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xoscar\backends\context.py", line 262, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xoscar\backends\context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "D:\Python\Python312\Lib\site-packages\xoscar\backends\pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xoscar\backends\pool.py", line 389, in _run_coro
    return await coro
  File "D:\Python\Python312\Lib\site-packages\xoscar\api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 105, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\utils.py", line 95, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 775, in rerank
    return await self._call_wrapper_json(
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 569, in _call_wrapper_json
    return await self._call_wrapper("json", fn, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 140, in _async_wrapper
    return await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 594, in _call_wrapper
    ret = await asyncio.to_thread(fn, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\asyncio\threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
      ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
    ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python312\Lib\site-packages\xinference\model\rerank\llama_cpp\core.py", line 208, in rerank
    result["results"] = result["results"][:top_n]
    ^^^^^^^^^^^^^^^^^
KeyError: [address=10.16.13.11:53707, pid=1964] 'results'

### Reproduction / 复现过程

配置reranker模型，然后提问

### Expected behavior / 期待表现

不报错，正常秩序，可以重排序

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rerank model can't use #4679

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rerank model can't use #4679

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions