Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 7, 2025

Extend MarkItDownReader with MCP Support ✅

  • Add ModelContextProtocol.Core package reference (version 0.4.0-preview.3)
  • Create MarkItDownMcpReader class for MCP transport
  • Implement MCP client integration for calling convert_to_markdown tool
  • Update documentation (README.md) with MCP usage examples
  • Add tests for MCP transport mode
  • Verify build and tests pass
  • Code review completed
  • Address all PR feedback
  • Fix package downgrade issue

Changes Made Based on Feedback:

  • Upgraded from ModelContextProtocol 0.4.0-preview.2 to ModelContextProtocol.Core 0.4.0-preview.3
  • Changed URI examples from /sse to /mcp throughout
  • Added optional McpClientOptions parameter to constructor
  • Used async file IO with FileOptions.Asynchronous for .NET Framework targets
  • Refactored data URI creation to helper method to eliminate duplication
  • Removed unnecessary pragma warnings
  • Added Docker args to examples (--http --host 0.0.0.0 --port 3001)
  • Added Aspire integration section to README with sample code
  • Fixed package downgrade by adding explicit Microsoft.Bcl.AsyncInterfaces reference for net10.0

Testing:

  • All 4 unit tests pass successfully across all target frameworks
  • Build is clean with no errors or package downgrade warnings
Original prompt

This section details on the original issue you should resolve

<issue_title>Provide MarkItDown MCP reader</issue_title>
<issue_description>We currently provide a MarkItDownReader that allows the users to parse any file using the markitdown executable. It requires our users to install python to be able to parse file locally.

To avoid that, we should extend the MarkItDownReader class with an ability to use MarkItDown MCP. This would allow our Aspire users and the template, to just spawn this docker image and treat it as a service.</issue_description>

<agent_instructions>Extend the current MarkItDownReader with an ability to use MarkItDown MCP. Use our C# MCP SDK for talking to the service.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Extend MarkItDownReader to support MarkItDown MCP Add MarkItDownMcpReader for MCP server support Nov 7, 2025
Copilot AI requested a review from adamsitnik November 7, 2025 10:29
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please apply my suggestions and address the remaining feedback.

…ew.3, add McpClientOptions, refactor code

Co-authored-by: adamsitnik <[email protected]>
Copy link
Contributor Author

Copilot AI commented Nov 7, 2025

@copilot please apply my suggestions and address the remaining feedback.

All feedback has been addressed in commit ab68e3a.

@adamsitnik
Copy link
Member

@copilot please try to solve the following build failure:

test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj(0,0): error NU1605: (NETCORE_ENGINEERING_TELEMETRY=Restore) Warning As Error: Detected package downgrade: Microsoft.Bcl.AsyncInterfaces from 10.0.0-rc.2.25502.107 to 9.0.10. Reference the package directly from the project to select a different version. 
 Microsoft.Extensions.DataIngestion.Tests -> Microsoft.Extensions.DataIngestion.MarkItDown -> ModelContextProtocol.Core 0.4.0-preview.3 -> System.Net.ServerSentEvents 10.0.0-rc.2.25502.107 -> Microsoft.Bcl.AsyncInterfaces (>= 10.0.0-rc.2.25502.107) 
 Microsoft.Extensions.DataIngestion.Tests -> Microsoft.Bcl.AsyncInterfaces (>= 9.0.10)

Copy link
Contributor Author

Copilot AI commented Nov 7, 2025

@copilot please try to solve the following build failure:

...

Fixed the package downgrade issue by adding an explicit reference to Microsoft.Bcl.AsyncInterfaces with version $(MicrosoftBclAsyncInterfacesNet10Version) for net10.0 target framework in the test project. This ensures the higher version (10.0.0-rc.2.25502.107) required by ModelContextProtocol.Core is used, avoiding the downgrade from 10.0.0 to 9.0.10. Commit b8e70dc.

@adamsitnik adamsitnik marked this pull request as ready for review November 7, 2025 19:17
@adamsitnik adamsitnik requested a review from a team as a code owner November 7, 2025 19:17
Copilot AI review requested due to automatic review settings November 7, 2025 19:17
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I am impressed that @copilot was able to get it working so quickly (while I was writing my Connect ;) )!

@adamsitnik adamsitnik enabled auto-merge (squash) November 7, 2025 19:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for converting documents to markdown using a MarkItDown MCP (Model Context Protocol) server, providing an alternative to the existing local process-based MarkItDownReader. The new MarkItDownMcpReader enables document ingestion through a remote/containerized MarkItDown service.

Key changes:

  • New MarkItDownMcpReader class that communicates with MCP servers via HTTP transport
  • Comprehensive README updates with usage examples for both local and MCP-based document conversion, including Aspire integration patterns
  • Added ModelContextProtocol.Core package dependency (version 0.4.0-preview.3)

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs New reader implementation using MCP protocol to convert documents to markdown via remote server
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkItDownMcpReaderTests.cs Unit tests covering constructor validation, null argument checks, and file existence validation
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/README.md Added documentation sections for MCP reader usage, Docker/pip installation instructions, and Aspire integration examples
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj Added ModelContextProtocol.Core package reference
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj Removed .NET Framework-specific workaround for Microsoft.Bcl.AsyncInterfaces
eng/packages/General.props Added ModelContextProtocol.Core version specification (0.4.0-preview.3)

@adamsitnik adamsitnik merged commit a8aad1a into main Nov 7, 2025
11 of 12 checks passed
@adamsitnik adamsitnik deleted the copilot/extend-markitdown-reader-mcp branch November 7, 2025 20:44
}

#pragma warning disable S3995 // URI return values should not be strings
private static string CreateDataUri(byte[] fileBytes, string? mediaType)
Copy link
Member

@stephentoub stephentoub Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, you already have a reference (indirectly) to the Microsoft.Extensions.AI.Abstractions package, which provides a DataContent type. DataContent implicitly supports data uris. You code user or instead like new DataContent(bytes, mime Type).Uri.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please send a new PR that adreses this feedback

#else
await source.CopyToAsync(ms).ConfigureAwait(false);
#endif
byte[] fileBytes = ms.ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, once you switch to using DataContent, it accepts a Readonlymemory, so rather than using ToArray, you can pass in a Readonlymemory created from MemoryStream's GetBuffer and Length

private async Task<string> ConvertToMarkdownAsync(string dataUri, CancellationToken cancellationToken)
{
// Create HTTP client transport for MCP
HttpClientTransport transport = new(new HttpClientTransportOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halter73, just fyi

joperezr pushed a commit to joperezr/extensions that referenced this pull request Nov 11, 2025
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Adam Sitnik <[email protected]>
joperezr pushed a commit to joperezr/extensions that referenced this pull request Nov 11, 2025
The following PRs are included in this backport:

- [MEDI] start producing NuGet packages (dotnet/extensions/dotnet#7016)
- Update version numbers in AI changelogs (dotnet/extensions/dotnet#7008)
- [MEDI] Don't stop document processing on enricher error (dotnet/extensions/dotnet#7005)
- [MEDI] add PackageTags (dotnet/extensions/dotnet#7022)
- Add MarkItDownMcpReader for MCP server support (dotnet/extensions/dotnet#7025)
- Image generation tool (dotnet/extensions/dotnet#6749)
- Make MEAI packages use 10.0 runtime packages (dotnet/extensions/dotnet#7028)

----
#### AI description  (iteration 1)
#### PR Classification
This pull request backports multiple MEAI library updates, including new image generation features, refactoring of data ingestion enrichers, removal of legacy exporter code, and updated OpenTelemetry instrumentation.

#### PR Summary
The changes integrate new image generation tool support into chat clients with corresponding types and integration tests, refactor data ingestion enrichers to use a unified `EnricherOptions` abstraction with batching, and remove outdated JSON schema exporter and nullability helper files while updating OpenTelemetry metrics and project metadata.
- `src/Libraries/Microsoft.Extensions.AI`: Added new types (`HostedImageGenerationTool.cs`, `ImageGenerationToolCallContent.cs`, `ImageGenerationToolResultContent.cs`) and integration tests to enable hosted image generation across AI providers.
- `src/Libraries/Microsoft.Extensions.DataIngestion`: Refactored enrichers (Sentiment, Keyword, Classification, Summary) to use the new `EnricherOptions` and batching via the `Batching.cs` utility, with updated tests.
- Removed legacy schema exporter files (e.g. files under `src/Shared/JsonSchemaExporter/` and `NullabilityInfoContext/`) to clean up unused functionality.
- Updated OpenTelemetry instrumentation in OpenAI, Azure AI, Embedding, and SpeechToText clients to align with the latest semantic conventions.
- Revised project and package configuration files with updated metadata, preview stage tags, and code quality settings.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
joperezr added a commit that referenced this pull request Nov 14, 2025
* Merged PR 54952: Getting ready for the 10.0 stable release. Flowing .NET Servicing

#### AI description  (iteration 1)
#### PR Classification
This PR updates dependency versions and build pipeline configurations to prepare for the .NET 10.0 stable release.

#### PR Summary
The changes update dependency and LTS versions (upgrading many from 9.0.10 to 9.0.11), enable release-specific flags, and streamline the build pipelines for servicing.
- **`eng/Version.Details.xml` and `eng/Versions.props`**: Upgraded various dependency versions and LTS numbers and set stabilization flags (e.g., `StabilizePackageVersion` to true, `DotNetFinalVersionKind` to release).
- **`azure-pipelines.yml`**: Removed the code coverage stage to simplify the CI pipeline.
- **`eng/pipelines/templates/BuildAndTest.yml`**: Added tasks to set up private feed credentials and commented out integration tests that require authentication.
- **`NuGet.config`**: Revised package source configuration by removing package source mapping and adding new internal feed URLs.
- **`Directory.Build.props`**: Suppressed NU1507 warnings to accommodate internal branch configuration.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

* [MEDI] start producing NuGet packages (#7016)

* remove IsPackable=false, provide all mandatory properties for each package we want to ship

* add basic READMEs

* Update version numbers in AI changelogs (#7008)

* [MEDI] Don't stop document processing on enricher error (#7005)

* introduce EnricherOptions option bag

* implement batching

* don't validate results returned by IChatClient

* don't expose FileInfo as source via IngestionResult, as it could be Stream in the future. Just expose the document id

* Enricher failures should not fail the whole ingestion pipeline, as they are best-effort enhancements

* [MEDI] add PackageTags (#7022)

* Add MarkItDownMcpReader for MCP server support (#7025)

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Adam Sitnik <[email protected]>

* Image generation tool (#6749)

* Prototype of using ImageGenerationTool

* Handle DataContent returned from ImageGen

* React to rename and improve metadata

* Handle image_generation tool content from streaming

* Add handling for combining updates with images

* Add tests for new ChatResponseUpdateExtensions

* Rename ImageGenerationTool to HostedImageGenerationTool

* Remove ChatResponseUpdateCoalescingOptions

* Add ImageGeneratingChatClient

* Fix namespace of tool

* Replace traces of function calling

* More namepsace fix

* Enable editing

* Update to preview OpenAI with image tool support

* Temporary OpenAI feed

* Fix tests

* Add integration tests for ImageGeneratingChatClient

* Remove ChatRole.Tool -> Assistant workaround

* Remove use of private reflection for Image results

* Add ChatResponseUpdate.Clone

* Move all mutable state into RequestState object

* Adjust prompt to improve integration test reliability

* Refactor tool initialization

I verified that the tool creation is cached by ReflectionAIFunctionDescriptor

This change includes a small optimization to avoid additional allocation
around inserting tools into the options.

* Add integration tests for streaming

Fixes the removal of tool content - this was broken for streaming when
I changed removal to be based on callId.  We don't have the CallId yet
in the streaming case so we have to remove by name.

* React to changes and fix tests

* Address feedback

* Fix SkipTestException from ConditionalTheory

* Fix formatting

* Add back image replacement coalescing (removed in merge)

* Fix template tests and use new OpenAI

* Remove use of temporary staging nuget feed

* Address feedback

* Make ImageGeneratingChatClient use ImageGenerationTool*Content

* Remove ApplyUpdates and Coalesce ImageResults instead of DataContent.

* Workaround OpenAI issue where image data is not read for partial images.

openai/openai-dotnet#809

* Improved workaround

* Return ImageGenerationToolCallContent from OpenAI

* Add OpenAI image tool tests with representation of real traffic

* Correct the event sequence for streaming single image

* Fix some docs and refactor for clarity

* Make MEAI packages use 10.0 runtime packages (#7028)

* Make MEAI packages use 10.0 runtime packages

* Add back MEAI.Abstractions JsonSchemaExporter tests

* Address feedback

* Remove unneeded trimming suppression

* When using latest .NET packages, force System.Numerics.Tensors to 10.0 (for MEAI) (#7031)

* Add a new Microsoft.Agents.AI.Templates package with an aiagents-webapi project template (#7014)

* Initial Microsoft.Agents.AI.Templates structure

* Refine Microsoft.Agents.AI.Templates infrastructure

* Move project template infrastructure utilities into a shared folder

* Add the webapi-agents project template content with GitHub models

* Support parameterized AI Service Provider

* Rename to aiagents-webapi

* Support parameterized chatmodel and update docs with renames

* Add Snapshot tests

* Add aiagents-webapi snapshot tests

* Add aiagents-webapi execution tests (and component governance)

* Improve aiagents-webapi template parameters

* Apply suggestions from copilot code review

Co-authored-by: Copilot <[email protected]>

* Move shared ProjectTemplate infrastructure to not get included in Shared.Tests

* Fix the template sandbox / execution tests after moving infrastructure

* Ignore CA1716 warning about 'Shared' namespace in template tests

* Clean up template sandbox source/output

* Rename to "aiagent-webapi" and favor singular "Agent". Docs cleanup.

* Update templates dev doc to cover Microsoft.Agents.AI.Templates too

* Fix remaining template sandbox references with new paths

* Add a tool call in aiagent-webapi. Update workflow API usage for upcoming change.

Fix snapshots

* Exclude csproj.in file from template package

* Add a survey link to the aiagent-webapi template's generated readme

---------

Co-authored-by: Copilot <[email protected]>

* Use `Microsoft.Extensions.DataIngestion` in AI Chat Web template (#7023)

* Add Markdown support

* Remove PDF support

* Revert "Remove PDF support"

This reverts commit e1d066034962c9686bf8150984b6adf0e25846c8.

* Add 'Example_GPS_Watch.md'

* Add MEDI dependencies

* Revert "[MEDI] Remove collection key type workaround (#7010)"

This reverts commit a369be9.

* MEDI integration into chat template

* Remove PdfPig dependency

* Fix citation + normalize identifier path

* Undo changes to `M.E.DI.csproj`

* Update snapshots

* Update DataIngestion unit tests to handle keys as either strings or guids

* Update SK and fix MEDI version

* Remove SK workaround

* Fix sandbox paths to allow running tests multiple times

* Reliable data ingestion

* Enable MEDI tracing

* Simplify log message

* Add `PdfPigReader` for non-Aspire template

* Invert PdfPigReader exclusion condition

* Use Markitdown MCP

* Update snapshots

* Undo changes to `IngestionPipelineTests.cs`

* Update src/ProjectTemplates/Microsoft.Extensions.AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs

Co-authored-by: Jeff Handley <[email protected]>

* Update snapshots

* Improve template execution test failure output

* Support .NET 10 in aichatweb, using it by default

* Show a message when loading documents by loading docs as a separate tool

* disable the incremental ingestion

* map every PDF page to a single section

* drop SK dependency

* Add system prompt instructions for calling the LoadDocuments tool. Fix code formatting.

---------

Co-authored-by: Jeff Handley <[email protected]>
Co-authored-by: Adam Sitnik <[email protected]>

* Add Agent Framework DevUI into the aiagent-webapi template (#7026)

* Integrate DevUI into the aiagent-webapi project template

* Improve aiagent-webapi Program.cs per feedback.

* Remove --no-devui. Fix OpenAI clients. Augment execution test sandbox ignores.

* Rename to Microsoft.Agents.AI.ProjectTemplates

* Set Microsoft.Agents.AI package versions

* Simplify the GitHub and OpenAI key config vars for aiagent-webapi

* Sort package references

* Fix troubleshooting section in READMEs

* Revert MEAI.Templates change. Make launchSettings .gitignore more specific.

---------

Co-authored-by: Mackinnon Buck <[email protected]>

* Fix display of target frameworks in agents template. Hide the chat model textbox from the IDE template UI.

---------

Co-authored-by: Adam Sitnik <[email protected]>
Co-authored-by: Stephen Toub <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Eric StJohn <[email protected]>
Co-authored-by: Jeff Handley <[email protected]>
Co-authored-by: Jeff Handley <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Mackinnon Buck <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide MarkItDown MCP reader

4 participants