Skip to content

rerank model can't use #4679

@justime

Description

@justime

System Info / 系統信息

windows 11
Xinference v2.2.0
xinference-local --host 10.16.13.11 --port 9997

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

Error: Error invoking remote method 'knowledge-base:rerank': Error: Rerank request failed: HTTP 500: Internal Server Error
Request details: {
"url": "http://10.16.13.11:9997/v1/rerank",
"message": "HTTP 500: Internal Server Error",
"status": 500,
"statusText": "Internal Server Error",
"responseBody": {
"detail": "[address=10.16.13.11:53707, pid=1964] 'results'"
},
"requestBody": {
"model": "Qwen3-Reranker-8B",
"query": "枸橼酸他莫昔芬片\n",
"documents": [
"1 / 2 一致性评价企业研究报告及生物等效性试验数据 信息公开 1. 基本情况汇总 表 通用名 枸橼酸他莫昔芬片 英文名 Tamoxifen Citrate Tablets 剂型及规格 片剂 ; 10mg( 按他莫昔芬计 ) 生产企业名称 上海复旦复华药业有限公司 生产企业地址 上海市闵行区曙光路 1399 号 上市许可持有人 上海复旦复华药业有限公司 最新批准文号 国药准字 H31021545 其它上市国家及上市时间 不适用 附加申请 ■有工艺变更 □无工艺变更 □其它 BE 供试样品批号 9230801 检验机构 上海复旦复华药业有限公司 检验结果 符合要求 完成的临床研究内容 ■ PK 终点生物等效性研究 □ PD 终点生物等效性研究 □临床研究 □其它 BE 备案号 / 临床试验批件号 B202300161 - 01 临床研究机构 东莞康华医院I期临床试验研究中心 数据统计分析机构 南京英锋医药科技有限公司 生物样本检测机构 南京科利泰医药科技有限公司 试验设计 单中心、随机、开放、单剂量、两周期、两序 列、自身交叉 空腹 和餐后试验 检测物质 他莫昔芬 检测方法 HPLC - MS/MS 法 临床 研究 豁免 情况 不适用 2 / 2 2. 生物等效性研究结果 空腹 BE n=30 药代动力学参数 (单位) 几何均值及比值 90% 置信区间 (%) 受试制剂( T ) 参比制剂( R ) T/R (%) C max (ng/mL) 19.61 20.06 97.74 93.07 - 102.64 AUC 0 - 72h (hng/mL) 482.09 475.69 101.35 98.30 - 104.49 餐后 BE n=30 药代动力学参数 (单位) 几何均值及比值 90% 置信区间 (%) 受试制剂( T ) 参比制剂( R ) T/R (%) C max (ng/mL) 24.09 24.21 99.51 95.23 - 103.98 AUC 0 - 72h (hng/mL) 600.55 600.90 99.94 97.83 - 102.10 3. 审评结论 建议 批准 。"
],
"top_n": 6
}
}

The command used to start Xinference / 用以启动 xinference 的命令

lot get_availabl: id 7 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 7 | task 0 | processing task, is_child = 0
slot update_slots: id 7 | task 0 | new prompt, n_ctx_slot = 5120, n_keep = 0, task.n_tokens = 699
srv send_error: task id = 0, error: input (699 tokens) is too large to process. increase the physical batch size (current batch size: 512)
slot release: id 7 | task 0 | stop processing: n_tokens = 0, truncated = 0 | 673M/4.90G [24:35<2:33:00, 497kB/s]
srv stop: cancel task, id_task = 0
srv update_slots: no tokens to decode
srv update_slots: all slots are idle
2026-03-10 18:56:52,846 xinference.core.model 1964 ERROR [request d8b75b9c-1c6f-11f1-89f8-b5fed4915bdf] Leave rerank, error: 'results', elapsed time: 0 s
Traceback (most recent call last):
File "D:\Python\Python312\Lib\site-packages\xinference\core\utils.py", line 95, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 775, in rerank
return await self._call_wrapper_json(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 569, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 140, in _async_wrapper
return await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 594, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\asyncio\threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\concurrent\futures\thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\model\rerank\llama_cpp\core.py", line 208, in rerank
result["results"] = result["results"][:top_n]
~~~~~~^^^^^^^^^^^
KeyError: 'results'
2026-03-10 18:56:52,880 xinference.api.restful_api 4932 ERROR [address=10.16.13.11:53707, pid=1964] 'results'
Traceback (most recent call last):
File "D:\Python\Python312\Lib\site-packages\xinference\api\restful_api.py", line 1093, in rerank
scores = await model.rerank(
^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xoscar\backends\context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xoscar\backends\context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
File "D:\Python\Python312\Lib\site-packages\xoscar\backends\pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xoscar\backends\pool.py", line 389, in _run_coro
return await coro
File "D:\Python\Python312\Lib\site-packages\xoscar\api.py", line 418, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 564, in on_receive
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 105, in wrapped_func
ret = await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\utils.py", line 95, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 775, in rerank
return await self._call_wrapper_json(
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 569, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 140, in _async_wrapper
return await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\core\model.py", line 594, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\asyncio\threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\concurrent\futures\thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python312\Lib\site-packages\xinference\model\rerank\llama_cpp\core.py", line 208, in rerank
result["results"] = result["results"][:top_n]
^^^^^^^^^^^^^^^^^
KeyError: [address=10.16.13.11:53707, pid=1964] 'results'

Reproduction / 复现过程

配置reranker模型,然后提问

Expected behavior / 期待表现

不报错,正常秩序,可以重排序

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions