feat: 浏览器操作录制与智能回放（含@引用、自动命名、元素编号一致性） by mango766 · Pull Request #286 · alibaba/page-agent

mango766 · 2026-03-17T13:34:46Z

功能概述

为 Page Agent 扩展增加完整的 浏览器操作录制与智能回放 功能，并在本次更新中新增三项核心增强。

Closes #285
Closes #298

核心功能

基础录制回放（已有）

录制：捕获用户在网页上的点击、输入、滚动、导航等操作，生成结构化语义步骤序列
回放：通过 LLM 引导的自适应执行，智能匹配页面元素完成回放
参数化：录制中的值可参数化，支持不同输入复用同一录制
自然语言修改：回放前通过自然语言修改录制行为

新增功能 1：@引用录制回放

在侧边栏输入框中输入 `@` 自动弹出所有已录制操作的列表
支持模糊搜索过滤，输入关键字即可快速定位
选择录制后追加自然语言描述，动态修改执行行为
示例：`@搜索视频这次搜索"人工智能"` → Agent 按录制流程执行，但把搜索词替换

新增功能 2：录制自动命名

停止录制后 自动调用 LLM 生成简短名称和描述
后台异步执行，不阻塞 UI 操作
录制详情页轮询检测命名完成后自动刷新显示
用户可随时手动修改名称和描述

新增功能 3：录制/回放元素编号一致性

录制时创建独立 `PageController`，每 3 秒刷新 DOM 树索引
将 `highlightIndex` 缓存到元素 dataset 并写入 `ElementDescriptor.idx`
回放时步骤描述包含 `(index:N)`，LLM 优先使用编号定位元素
编号匹配失败时自动回退到语义匹配（文本、aria-label、role 等）

元素定位策略

多信号语义描述符 + 编号辅助：

```
优先级：index（编号） > text > ariaLabel > role > placeholder > name > selector
```

紧凑 LLM 格式（~10 tokens/步）：

```
[1] click "搜索" button (index:15) in header
[2] type "TypeScript教程" → search input (index:16) [PARAM:searchKeyword]
[3] press Enter → search input (index:16)
```

测试方法

基础录制回放

开始录制 → 执行点击、输入、导航操作 → 停止 → 验证步骤正确捕获
录制列表显示已保存的录制及正确的元数据
打开录制详情 → 编辑名称/描述 → 保存 → 验证持久化
修改 ParamEditor 中的参数 → 回放 → 验证 Agent 使用新参数值
添加自然语言修改 → 回放 → 验证 Agent 自适应调整行为
导出 JSON → 验证剪贴板中包含有效的 Recording JSON

@引用录制回放（新增）

输入框输入 `@` → 验证立即弹出全部录制列表
输入 `@关键字` → 验证模糊过滤匹配项
点击选择录制 → 验证输入框自动填入 `@录制名称`
选择录制并追加自然语言 → 通过 buildReplayTask 构建任务并执行

录制自动命名（新增）

录制操作 → 停止 → 后台自动调用 LLM 生成名称和描述
录制详情页每 2 秒轮询，命名完成后自动刷新显示
手动修改名称和描述 → 保存 → 验证持久化不被覆盖（仅空名称时触发自动命名）

元素编号一致性（新增）

录制时 EventRecorder 通过 PageController 获取 highlightIndex 写入 ElementDescriptor.idx
回放时 formatStepForLLM 在描述中包含 `(index:N)`
replay_prompt 指示 LLM 优先使用 `click_element_by_index(N)`，失败时语义回退

边界场景

Shadow DOM 元素录制（composedPath + getRootNode 穿透）
contenteditable 元素录制（textContent 捕获 + role=textbox 标记）
Service Worker 重启恢复（chrome.storage.session 状态持久化）
国际化支持（zh-CN / en-US，60+ 词条）
输入防抖（input 500ms、scroll 300ms）
`wxt build` 构建无报错
多标签页录制端到端验证（打开新标签、切换标签、关闭标签）——代码已实现，待实际多标签页环境验证

🤖 Generated with Claude Code

CLAassistant · 2026-03-17T13:34:53Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

easonysliu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Implement full recording & replay pipeline: - Recording engine: content script event capture (click/input/scroll/keypress) with semantic element descriptors, background multi-tab aggregation - Recording UI: real-time step display, recording list, detail view with parameter editing and natural language modification - Replay engine: recordings converted to compact LLM plans injected via systemInstruction, zero changes to PageAgentCore - i18n support (zh-CN/en-US), accessibility (ARIA), error handling - Edge cases: shadow DOM, contenteditable, file upload, SW restart recovery Bug fixes applied: message re-entry loop, stale closure in stopRecording, missing scripting permission, refreshTabIndexMap race condition, rapid start/stop guard, clipboard async, ParamEditor/RecordingDetail state reset, replay config-execute race condition.

1. @引用录制回放 - 输入框输入 @ 自动弹出已录制列表，支持模糊搜索 - 选择录制后可追加自然语言描述修改执行行为 - 例如: @搜索视频这次搜索"人工智能" - 新增 useRecordingMention hook 和 MentionSuggestions 组件 2. 录制自动命名 - 停止录制后自动调用 LLM 生成名称和描述 - 后台异步执行，不阻塞 UI - RecordingDetail 页面轮询检测命名完成后自动刷新显示 - 用户可随时手动修改名称和描述 - 新增 autoNameRecording 模块 3. 录制/回放元素编号一致性 - 录制时创建独立 PageController，每3秒刷新DOM树 - 将 highlightIndex 缓存到元素 dataset 并写入 ElementDescriptor.idx - 回放时步骤描述包含 (index:N)，LLM 优先用编号定位元素 - 编号匹配失败时自动回退到语义匹配（文本、aria-label 等） - 更新 replay_prompt 增加 index 优先匹配规则 Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>

mango766 force-pushed the feature/recording-replay branch from 3451832 to be80284 Compare March 17, 2026 13:52

mango766 force-pushed the feature/recording-replay branch from be80284 to 6cfbfd8 Compare March 18, 2026 02:40

mango766 changed the title ~~feat: add browser action recording and intelligent replay~~ feat: 浏览器操作录制与智能回放（含@引用、自动命名、元素编号一致性） Mar 18, 2026

mango766 mentioned this pull request Mar 18, 2026

Feature: 录制回放增强 — @引用、自动命名、元素编号一致性 #298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 浏览器操作录制与智能回放（含@引用、自动命名、元素编号一致性）#286

feat: 浏览器操作录制与智能回放（含@引用、自动命名、元素编号一致性）#286
mango766 wants to merge 2 commits intoalibaba:mainfrom
mango766:feature/recording-replay

mango766 commented Mar 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mango766 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

功能概述

核心功能

基础录制回放（已有）

新增功能 1：@引用录制回放

新增功能 2：录制自动命名

新增功能 3：录制/回放元素编号一致性

元素定位策略

测试方法

基础录制回放

@引用录制回放（新增）

录制自动命名（新增）

元素编号一致性（新增）

边界场景

Uh oh!

CLAassistant commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mango766 commented Mar 17, 2026 •

edited

Loading