Skip to content

feat(bedrock): add converse mcp support#9134

Open
jameshiester-oai wants to merge 13 commits intomainfrom
codex/bedrock-converse-mcp-example
Open

feat(bedrock): add converse mcp support#9134
jameshiester-oai wants to merge 13 commits intomainfrom
codex/bedrock-converse-mcp-example

Conversation

@jameshiester-oai
Copy link
Copy Markdown
Contributor

Summary

  • Add MCP server configuration support to the Bedrock Converse provider.
  • Convert discovered MCP tools into Bedrock Converse tool definitions and execute MCP tool calls during Converse responses.
  • Add an Amazon Bedrock Converse MCP example using DeepWiki and document it in the Bedrock examples README.

Test plan

  • git -C promptfoo diff --check origin/main...HEAD
  • YAML parsed locally for examples/amazon-bedrock/models/promptfooconfig.converse-mcp.yaml
  • Pre-commit hook ran Prettier on the example README and YAML config

Note: I did not run the live Bedrock example because it requires AWS Bedrock credentials, model access, and live MCP/network behavior.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Model Context Protocol (MCP) integration to the AWS Bedrock Converse provider so MCP servers can be configured, their tools surfaced as Bedrock tools, and tool-use requests executed (including in streaming). Also updates the Red Team target configuration UI to support Bedrock Invoke vs Converse IDs and configure MCP servers, and adds a Bedrock Converse+MCP example.

Changes:

  • Add MCP client initialization/cleanup and tool discovery → Bedrock Converse toolConfig conversion, plus MCP tool execution for tool_use responses (non-streaming + streaming).
  • Update Red Team provider configuration UI to support Bedrock as a foundation-model provider, including an InvokeModel/Converse selector and MCP server editor.
  • Add tests and a documented example config for Bedrock Converse with MCP (DeepWiki).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/providers/bedrock/converse.test.ts Adds unit tests covering MCP init/cleanup, toolConfig injection, and tool execution (including streaming).
src/providers/fal.ts Type assertion tweak for fal.config call.
src/providers/bedrock/converse.ts Implements MCP integration: init/cleanup, tool discovery → Bedrock tools, and MCP tool execution paths.
src/app/src/pages/redteam/setup/components/Targets/ProviderConfigEditor.tsx Treats Bedrock as a foundation-model provider; removes MCP config when switching away from Converse IDs.
src/app/src/pages/redteam/setup/components/Targets/ProviderConfigEditor.test.tsx Tests Bedrock foundation-model rendering and MCP config removal when switching IDs.
src/app/src/pages/redteam/setup/components/Targets/FoundationModelConfiguration.tsx Adds Bedrock Invoke/Converse selector and MCP servers configuration UI stored under config.mcp.
src/app/src/pages/redteam/setup/components/Targets/FoundationModelConfiguration.test.tsx Adds tests for Bedrock mode switching and MCP server config persistence.
examples/amazon-bedrock/models/README.md Documents a new Converse+MCP example and configuration snippet.
examples/amazon-bedrock/models/promptfooconfig.converse-mcp.yaml Adds a runnable example using a DeepWiki MCP server with Bedrock Converse.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +1268 to +1272
// Handle MCP tool callbacks before local function callbacks.
if (this.mcpClient && toolUseBlocks.length > 0) {
const results: string[] = [];
let hasSuccessfulCallback = false;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. parseResponse now dispatches each tool_use block individually: MCP for tools matching an MCP-discovered name, functionToolCallbacks for the remaining blocks, and the default tool_use JSON serialization for anything still unhandled. Mixed responses now surface results from both sources. Added a regression test ("should run MCP and functionToolCallbacks for different tools in the same response").

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +1374 to +1378
@@ -1237,6 +1375,10 @@ export class AwsBedrockConverseProvider extends AwsBedrockGenericProvider implem
prompt: string,
context?: CallApiContextParams,
): Promise<ProviderResponse & { stream?: AsyncIterable<string> }> {
if (this.initializationPromise != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. The JSDoc on callApiStreaming now clarifies that MCP tools ARE executed automatically in streaming mode and only functionToolCallbacks are non-streaming-only.

Comment on lines 68 to +75
const updateCustomTarget = (field: string, value: unknown) => {
const updatedTarget = { ...provider } as ProviderOptions;

if (field === 'id') {
updatedTarget.id = value as string;
if (providerType === 'bedrock' && !updatedTarget.id.startsWith('bedrock:converse:')) {
delete updatedTarget.config.mcp;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. updateCustomTarget now shallow-clones provider.config along with provider so subsequent field assignments and the delete updatedTarget.config.mcp no longer mutate React state by reference.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a625053e6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +1300 to +1303
if (hasSuccessfulCallback && results.length > 0) {
return {
output: results.join('\n'),
tokenUsage,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Process all tool-use blocks before returning MCP output

When a response includes both an MCP tool call and a non-MCP functionToolCallbacks call, the early return after MCP handling exits before the function-callback pass. As a result, non-MCP callbacks are skipped and output is incomplete. This regresses mixed-tool workflows where both callback types are expected to run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15 alongside the matching Copilot comment. Each tool_use block is now dispatched independently — MCP for matching MCP tools, functionToolCallbacks for the remainder, default tool_use JSON for anything still unhandled. New regression test should run MCP and functionToolCallbacks for different tools in the same response.

jameshiester-oai and others added 3 commits May 6, 2026 17:26
Address review findings on the new Bedrock Converse MCP integration:

- Wire `shutdown()` into providerRegistry so stdio MCP child processes are
  released at end of eval (previously `cleanup()` was implemented but never
  invoked). `cleanup()` is now safe to call multiple times and resilient to
  a failed init.
- Surface MCP failures via `ProviderResponse.error` instead of embedding the
  error message in `output` with `success: true`. Broken MCP servers no longer
  produce green eval rows.
- Catch `initializeMCP()` rejection in the constructor so a misconfigured
  provider that's never invoked can't trigger an UnhandledPromiseRejection;
  the recorded error is reported on the first `callApi`.
- Gate the streaming MCP execution path on `!toolsDisabled` (matching the
  non-streaming path) so `toolChoice: 'none'` cannot accidentally invoke MCP.
- Make `parseToolInput` total: malformed JSON tool_use input from the model
  yields `{}` with a warning, not a thrown SyntaxError that crashes the row.
- Dedup tool names between `config.tools` and MCP-discovered tools (Bedrock
  rejects duplicates with ValidationException). MCP-provided tools win, with
  a warning logged on conflict.
- UI: don't seed MCP servers with `command: ''` and don't auto-set
  `mcp.enabled: true` until at least one server has a usable transport
  (command/path/url). Empty servers no longer crash the eval at provider init.
- Rename `hasSuccessfulCallback` → `hasMcpInvocation` (the variable is set on
  errors too, so the old name was misleading).
- Log a debug breadcrumb when `normalizeMCPToolContent` falls through to the
  unknown-shape branch.

Docs:
- Add MCP section to `site/docs/providers/aws-bedrock.md` covering config
  shape, single-turn execution semantics, tool-name collisions, lifecycle,
  and `toolChoice: none` behavior.
- Document single-turn semantics in the example README.

Tests:
- 15 new backend tests covering the contract changes (error propagation, init
  failure, cleanup safety, multiple-cleanup, toolChoice=none gating in both
  streaming and non-streaming paths, tool-name dedup, malformed JSON safety,
  $schema stripping, mixed-content normalization, array-input coercion,
  unconfigured-server early-out).
- 2 new frontend tests covering the MCP-enabled toggle behavior.

Verified end-to-end against real AWS Bedrock + DeepWiki MCP:
- Healthy run: 2/2 pass, error=null, $0.0068
- Broken-URL run: 1/1 error with descriptive `error` field
- Drop setTimeout-based microtask flush in the unhandled-rejection test;
  callApi already awaits initializationPromise internally, so observing the
  recorded error through the response is enough.
- Tighten Promise truthy-check in cleanup() (biome noMisusedPromises).
- Reflow toolNames assertion onto one line.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a12baddbdb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +793 to +797
// Register only after a successful init so we don't keep a zombie reference
// around if initialization failed.
if (!this.registeredForShutdown) {
providerRegistry.register(this);
this.registeredForShutdown = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register provider before MCP init can fail

Because registration happens only after await this.mcpClient.initialize(), any partial init failure (for example, one MCP server connects and a later server fails) leaves MCP resources allocated while the provider is never added to providerRegistry. In that failure path, callApi/callApiStreaming return an error immediately, so evaluator shutdown via providerRegistry.shutdownAll() cannot invoke cleanup(), which can leak stdio child processes or open transports for the rest of the run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. providerRegistry.register(this) now runs BEFORE await this.mcpClient.initialize(), so a partial init failure (one server connects, a later one fails) still gets cleaned up by providerRegistry.shutdownAll(). cleanup() is already resilient to a rejected initializationPromise. Regression test added: should register with providerRegistry before awaiting MCP init.

return true;
});

const converseTools = convertToolsToConverseFormat([...mcpTools, ...dedupedConfigTools]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate MCP tools across multiple servers

This merge only filters collisions between config.tools and MCP tools, but it never deduplicates mcpTools themselves. If two configured MCP servers expose the same tool name, both are sent in toolConfig.tools, and Bedrock rejects duplicate tool names with ValidationException, causing every request to fail for that multi-server setup.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. transformMCPToolsToBedrockConverse now dedupes by tool name (first server wins) with a logger.warn on collision. Multi-server setups exposing the same tool name no longer trip Bedrock's ValidationException. Regression test: should dedupe MCP tool names exposed by multiple servers.

@use-tusk
Copy link
Copy Markdown
Contributor

use-tusk Bot commented May 7, 2026

Generated 17 tests - 17 passed

Commit tests View tests

Tip

New to Tusk Unit Tests? Learn more here.

Test Summary

  • FoundationModelConfiguration - 5 ✓
  • buildBedrockProviderId - 2 ✓
  • getBedrockApiModeFromId - 1 ✓
  • getBedrockModelFromId - 5 ✓
  • updateCustomTarget - 4 ✓

Results

Tusk's tests are all passing and cover the critical paths for the new Bedrock Converse MCP support. The test suite validates that getBedrockModelFromId correctly parses both bedrock:invoke and bedrock:converse prefixes, ensuring the UI properly extracts model IDs across both API modes. The FoundationModelConfiguration component tests confirm that MCP configuration UI only appears for Bedrock Converse (not invoke mode) and that non-Bedrock providers don't accidentally trigger Bedrock-specific parsing logic. The updateCustomTarget tests are particularly important—they verify that switching between invoke and converse modes correctly handles MCP config cleanup (removing it only when switching to invoke) while preserving other provider settings like temperature and max_tokens. The buildBedrockProviderId tests ensure provider IDs maintain their correct prefixes during model updates in both modes. Overall, Tusk's tests cover the hot paths for mode switching, MCP configuration persistence, and provider ID construction—all critical to preventing regressions in the red team targeting UI.

Avg +1% line coverage gain across 1 file
Source file Line Branch
src/app/src/pages/redteam/setup/components/Targets/FoundationModelConfiguration.tsx 93% (+1%) 94% (+1%)

Coverage is calculated by running tests directly associated with each source file, learn more here.

Last run on 316c9f5. Comment @use-tusk generate to create a test suite on the latest commit. Configure here.

View check history

Commit Unit Tests Created (UTC)
a12badd No tests generated · Output May 7, 2026 1:41AM
316c9f5 17 ✓, 0 ✗ · Tests May 7, 2026 1:53AM

Was Tusk helpful? Give feedback by reacting with 👍 or 👎

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21d057c9d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +1152 to +1153
if (this.mcpInitError) {
return { error: `Bedrock Converse MCP initialization failed: ${this.mcpInitError.message}` };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip MCP init errors when tool choice disables tools

When mcp.enabled is set but the MCP server fails to initialize, this unconditional early return makes every callApi fail before tool choice is evaluated. That breaks valid configurations where a test case sets toolChoice/tool_choice to none (or otherwise avoids tools), because those requests should still execute without MCP. The same guard exists in callApiStreaming, so both paths are blocked by an MCP outage even when tools are intentionally disabled.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. callApi and callApiStreaming only return the MCP-init-failure error when tools are still active for the request. When the test case sets toolChoice: 'none' (or tool_choice: 'none'), the request proceeds without MCP. Regression test: should still serve toolChoice=none requests when MCP init failed.

mldangelo added 5 commits May 6, 2026 21:53
Biome's lint/nursery/useReduceTypeParameter rule was flagging StrategyTable's
reduce initializer. Pass GroupedStrategies as the reduce type parameter so the
empty-object initial value doesn't need a type assertion.
Reduces cognitive complexity of parseResponse and callApiStreaming by hoisting
the MCP-dispatch loop into dedicated helpers (executeMcpToolCalls method,
formatStreamingToolUseBlocks method) and the JSON parse + error aggregation
into module-scope helpers. Behavior unchanged; all 165 unit tests pass.
The reduce<T> generic version didn't actually clear the Style Check error.
Reverting to match origin/main exactly so we can isolate the real source.
Three formatting deltas biome ci wanted:
- Wrap isMCPServerConfigured signature across lines (was over 100-col).
- Inline the constructor's MCP init failure logger.error onto a single line.
- Inline formatStreamingToolUseBlocks' MCP catch logger.error onto a single
  line.

Behavior unchanged; all 165 unit tests still pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3c0debff1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

'groq',
'deepseek',
'azure',
'bedrock',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore Bedrock custom-config editing path

Routing bedrock through FoundationModelConfiguration removes the raw CustomTargetConfiguration editor that previously allowed arbitrary Bedrock settings. The new Bedrock form only exposes a small subset of fields, so users can no longer set required per-target options like config.region (or other Bedrock-specific fields such as endpoint/guardrail settings); those targets then fall back to defaults (e.g., us-east-1) and can fail in non-default-region deployments.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 228ae15. FoundationModelConfiguration now renders a "Bedrock Settings" collapsible section (when providerType === 'bedrock') exposing AWS Region, AWS Profile, and Inference Model Type — the most common Bedrock-specific fields that disappeared with the route change. Region defaults to us-east-1 (unchanged) but is now editable from the UI, fixing non-default-region deployments.

Comments addressed (Copilot + Codex):

P1 — Process all tool_use blocks in mixed dispatch (Copilot, Codex):
  parseResponse no longer early-returns after MCP. It now dispatches each
  tool_use block to MCP if a matching MCP tool exists, otherwise to a
  configured functionToolCallback, otherwise to the default tool_use JSON
  serialization. Mixed responses (one MCP tool + one local callback) now
  surface both outputs.

P1 — Register provider before MCP init can fail (Codex):
  providerRegistry.register(this) now runs BEFORE awaiting
  mcpClient.initialize(). A partial init failure (e.g., one server
  connects, a later one fails) is still cleaned up by shutdownAll().

P1 — Skip MCP init error when tool choice is none (Codex):
  callApi/callApiStreaming no longer block all requests on a failed MCP
  init. When the test case sets toolChoice/tool_choice to 'none', the
  request executes without MCP — matching the documented contract that
  disabling tools opts out of the entire tool path.

P1 — Restore Bedrock custom-config editing path (Codex):
  FoundationModelConfiguration now exposes a Bedrock Settings section with
  AWS Region, AWS Profile, and Inference Model Type — the
  Bedrock-specific fields that disappeared when we routed bedrock through
  the foundation-model form. Defaults to us-east-1 when unset (unchanged
  from prior behavior).

P1 — ProviderConfigEditor shallow-clone (Copilot):
  updateCustomTarget now clones provider.config along with provider so
  `delete updatedTarget.config.mcp` and other field assignments don't
  mutate React state by reference.

P2 — Deduplicate MCP tools across servers (Codex):
  transformMCPToolsToBedrockConverse now drops duplicate tool names with a
  warning. Bedrock rejects duplicate tool names with ValidationException;
  multi-server setups exposing the same tool name no longer fail every
  request.

Doc — callApiStreaming doc comment (Copilot):
  Updated the JSDoc to clarify that MCP tool execution does happen in
  streaming mode, and only functionToolCallbacks are non-streaming-only.

Tests:
  - 4 new backend tests covering: multi-server MCP dedup, mixed
    MCP+functionToolCallbacks dispatch, registry registration before
    init, and toolChoice=none surviving a failed MCP init.
  - All 169 backend tests + 42 frontend tests still pass.
  - Real eval against AWS Bedrock + DeepWiki MCP: 2/2 pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 228ae158ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/providers/bedrock/converse.ts Outdated
Comment on lines +1190 to +1192
if (this.initializationPromise != null) {
await this.initializationPromise;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bypass MCP init wait when tools are disabled

Fresh evidence relative to the earlier MCP-init discussion: callApi still awaits MCP initialization before checking whether this request disables tools. In configs where mcp.enabled is true but a test sets toolChoice/tool_choice to none, this path can still stall on MCP connect timeouts (or a hung transport) even though no tool calls should run for that request; callApiStreaming has the same ordering. The tool-choice check should happen before awaiting MCP init so tool-disabled requests can proceed immediately without MCP availability.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6608ec. The tool-choice resolution now happens BEFORE awaiting initializationPromise, so tool-disabled requests skip both the wait and the init-error check. A hung MCP transport can no longer stall a toolChoice: 'none' request.

Verification:

  • New regression test should not await MCP init for toolChoice=none requests races a 200ms timeout against a never-resolving MCP init.
  • Real-eval against AWS Bedrock with a deliberately broken MCP URL + tool_choice: none: passes with the model's response ("hello"), confirming the bypass works end-to-end.

The new helper awaitMcpReadyForRequest() resolves the effective tool choice first via isRequestToolsDisabled(); both callApi and callApiStreaming now share it.

Address fresh Codex P1 feedback: a hung MCP transport should never stall a
request that has opted out of tools entirely. Previously `callApi` and
`callApiStreaming` always awaited `initializationPromise` before checking
the effective tool choice — a slow or never-resolving MCP connect could
block tool-disabled requests indefinitely. Now the tool-choice resolution
runs first, and the await only happens when the request actually needs MCP.

Also folds in code-simplifier improvements applied alongside this fix:
- Centralize repeated `err instanceof Error ? err.message : String(err)` as
  `errorMessage(err)`.
- Centralize the four `MCP Tool Result/Error (${name}): …` strings as
  `formatMcpToolResult` / `formatMcpToolError`.
- Extract `dispatchMcpToolCall` shared by `executeMcpToolCalls` and
  `formatStreamingToolUseBlocks` so the try/catch + parse + format logic
  exists in one place.
- Fold `parseStreamingToolInput` to return `Record<string, unknown>` so
  callers no longer need a redundant second `parseToolInput` pass.
- Hoist `isServerConfigured` to module scope in
  `FoundationModelConfiguration` (was reallocated per render).
- Drop unused `invoked` field from `executeMcpToolCalls`'s return value.

Tests:
- New regression test `should not await MCP init for toolChoice=none
  requests` races a 200ms timeout against a never-resolving MCP init —
  fails fast if the await regresses.
- All 170 backend + 42 frontend tests still pass.
- Real eval verification:
  - Healthy run: 2/2 pass, error=null
  - Broken-MCP run: 1/1 error with "MCP initialization failed"
  - Broken-MCP + tool_choice=none: 1/1 pass, error=null, output="hello"
    (confirms the bypass works end-to-end against real Bedrock)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6608ec745

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1462 to +1472
const { value: parsedInput, failed: parseFailed } = parseStreamingToolInput(toolBlock.input);
const matchedMcpTool = mcpTools.find((tool) => tool.name === toolBlock.name);

if (toolsDisabled || !this.mcpClient || !matchedMcpTool) {
toolUseParts.push(
JSON.stringify({
type: 'tool_use',
id: toolBlock.toolUseId,
name: toolBlock.name,
input: parsedInput,
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve raw tool_use input in streaming fallback output

Move JSON parsing behind the MCP-dispatch branch so fallback serialization keeps the model-emitted payload intact. Right now parseStreamingToolInput() runs before checking toolsDisabled/MCP match, so malformed JSON is coerced to {} and then returned as a normal fallback tool_use block for non-MCP tools (including toolChoice: none), which hides the model error and can invalidate assertions/debugging that rely on the actual emitted arguments.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@mldangelo mldangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, approving. Thanks @jameshiester-oai for the great work on this — the Bedrock Converse MCP integration fills a real gap, and the design (MCP tools merged with config.tools, single-turn execution semantics, clean separation of streaming vs. non-streaming dispatch) holds up well under stress.

What's been verified end-to-end:

  • Real eval against bedrock:converse:us.anthropic.claude-sonnet-4-6 in us-east-1 with the DeepWiki MCP server: 2/2 pass, $0.0068, stopReason=tool_use, MCP tool results correctly returned.
  • Broken-MCP path: a deliberately bad MCP URL surfaces as success: false with a descriptive ProviderResponse.error (instead of green-lighting on an embedded error string).
  • Broken-MCP + tool_choice: none: bypasses MCP entirely and returns the model's text response — confirming hung/failed MCP transports don't block tool-disabled requests.
  • 170 backend + 42 frontend tests green; full CI matrix (36 checks) green; biome format + tsc clean.

Hardening that landed during review (mostly in response to Copilot/Codex feedback):

  • Lifecycle: shutdown() wired into providerRegistry; provider registers BEFORE awaiting mcpClient.initialize() so partial init failures still get cleaned up; cleanup() is safe to call multiple times and resilient to a rejected init promise.
  • Error contract: MCP errors (both mcpResult.error and thrown exceptions) propagate into ProviderResponse.error so downstream consumers (assertions, exit codes, redteam grader) treat broken MCP calls as failures.
  • Mixed dispatch: parseResponse now routes each tool_use block independently (MCP for matching tools, functionToolCallbacks for the remainder, default JSON for anything still unhandled). A response with both MCP and local-callback tools surfaces both outputs.
  • Tool-name dedup: collisions between config.tools and MCP-discovered tools are dropped with a warning; same dedup applied across multiple MCP servers (avoids Bedrock's ValidationException).
  • Tools-disabled fast path: tool_choice: 'none' requests skip the MCP init wait AND the init-error check entirely — a hung MCP transport can no longer stall them.
  • Streaming: now honors toolChoice: 'none' (gates on !toolsDisabled matching the non-streaming path); JSDoc updated to reflect that MCP runs in streaming, but functionToolCallbacks don't.
  • UI: Bedrock InvokeModel/Converse selector + MCP server editor + Bedrock Settings section (region, profile, inferenceModelType — fixes the regression where users couldn't set non-default-region from the form). MCP enabled flips to true only after a server has a usable transport, so empty servers don't crash provider init.
  • Docs: new MCP section in site/docs/providers/aws-bedrock.md; example README documents single-turn execution semantics.

Nice work — shipping it. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants