From 8e8f547a0a1d9a470ed0040e1779af7df960dc49 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 21 Nov 2025 13:00:01 +0000 Subject: [PATCH 1/4] Initial plan From cb6dd50b95d77663b9f34c0ca2053477e3c9cd8d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 21 Nov 2025 13:05:24 +0000 Subject: [PATCH 2/4] Add CHANGELOG.md files for Microsoft.Extensions.DataIngestion* projects Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com> --- .../CHANGELOG.md | 20 ++++++++++++++ .../CHANGELOG.md | 7 +++++ .../CHANGELOG.md | 7 +++++ .../CHANGELOG.md | 26 +++++++++++++++++++ 4 files changed, 60 insertions(+) create mode 100644 src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md create mode 100644 src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md create mode 100644 src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md create mode 100644 src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md new file mode 100644 index 00000000000..41b0a8c60b9 --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md @@ -0,0 +1,20 @@ +# Release History + +## 10.0.0-preview.1 + +- Initial preview release of Microsoft.Extensions.DataIngestion.Abstractions +- Introduced `IngestionDocument` class for representing format-agnostic document containers +- Introduced `IngestionDocumentElement` abstract base class for document elements +- Introduced document element types: + - `IngestionDocumentSection` - Represents a section or page in a document + - `IngestionDocumentParagraph` - Represents a paragraph + - `IngestionDocumentHeader` - Represents a header with optional level + - `IngestionDocumentFooter` - Represents a footer + - `IngestionDocumentTable` - Represents a table with 2D cell array + - `IngestionDocumentImage` - Represents an image with optional binary content and alternative text +- Introduced `IngestionChunk` class for representing content chunks +- Introduced `IngestionChunker` abstract base class for splitting documents into chunks +- Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents +- Introduced `IngestionDocumentProcessor` abstract base class for processing documents +- Introduced `IngestionChunkProcessor` abstract base class for processing chunks +- Introduced `IngestionChunkWriter` abstract base class for writing chunks to storage diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md new file mode 100644 index 00000000000..af952a14a33 --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md @@ -0,0 +1,7 @@ +# Release History + +## 10.0.0-preview.1 + +- Initial preview release of Microsoft.Extensions.DataIngestion.MarkItDown +- Introduced `MarkItDownReader` class for converting documents to markdown using the MarkItDown CLI +- Introduced `MarkItDownMcpReader` class for converting documents using MarkItDown Model Context Protocol (MCP) server diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md new file mode 100644 index 00000000000..08d37c245a3 --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md @@ -0,0 +1,7 @@ +# Release History + +## 10.0.0-preview.1 + +- Initial preview release of Microsoft.Extensions.DataIngestion.Markdig +- Introduced `MarkdownReader` class for reading markdown documents and converting them to `IngestionDocument` +- Introduced `MarkdownParser` class for parsing markdown content using Markdig library diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md new file mode 100644 index 00000000000..6d8b505e559 --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md @@ -0,0 +1,26 @@ +# Release History + +## 10.1.0-preview.1 + +- Introduced `SectionChunker` class for treating each document section as a separate entity (https://github.com/dotnet/extensions/pull/7015) + +## 10.0.0-preview.1 + +- Initial preview release of Microsoft.Extensions.DataIngestion +- Introduced `IngestionPipeline` class for orchestrating document ingestion workflows +- Introduced `IngestionPipelineOptions` class for configuring pipeline behavior +- Introduced `IngestionResult` class for representing ingestion operation results +- Introduced chunker implementations: + - `HeaderChunker` - Splits documents based on headers and their levels + - `ElementsChunker` - Splits documents into chunks of individual elements + - `SemanticSimilarityChunker` - Splits documents based on semantic similarity using embeddings +- Introduced `IngestionChunkerOptions` class for configuring chunker behavior (token limits, overlap, etc.) +- Introduced document processors/enrichers: + - `ClassificationEnricher` - Enriches document metadata with classifications + - `KeywordEnricher` - Enriches document metadata with keywords + - `SentimentEnricher` - Enriches document metadata with sentiment analysis + - `SummaryEnricher` - Enriches document metadata with summaries + - `ImageAlternativeTextEnricher` - Enriches images with alternative text descriptions +- Introduced `EnricherOptions` class for configuring enricher behavior +- Introduced `VectorStoreWriter` class for writing chunks to vector stores +- Introduced `VectorStoreWriterOptions` class for configuring vector store writing behavior From 321b1b4f524f6c1976033dcc2fadaab109429fbe Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 21 Nov 2025 13:24:42 +0000 Subject: [PATCH 3/4] Remove internal types from CHANGELOGs (MarkdownParser and ElementsChunker) Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com> --- .../Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md | 1 - src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md | 1 - 2 files changed, 2 deletions(-) diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md index 08d37c245a3..243cf45b1c1 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md @@ -4,4 +4,3 @@ - Initial preview release of Microsoft.Extensions.DataIngestion.Markdig - Introduced `MarkdownReader` class for reading markdown documents and converting them to `IngestionDocument` -- Introduced `MarkdownParser` class for parsing markdown content using Markdig library diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md index 6d8b505e559..b79a4180eca 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md @@ -12,7 +12,6 @@ - Introduced `IngestionResult` class for representing ingestion operation results - Introduced chunker implementations: - `HeaderChunker` - Splits documents based on headers and their levels - - `ElementsChunker` - Splits documents into chunks of individual elements - `SemanticSimilarityChunker` - Splits documents based on semantic similarity using embeddings - Introduced `IngestionChunkerOptions` class for configuring chunker behavior (token limits, overlap, etc.) - Introduced document processors/enrichers: From 4c94b4d866648a08ef9d8d66b0062f2574c379b5 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 24 Nov 2025 13:19:31 +0000 Subject: [PATCH 4/4] Simplify CHANGELOGs to just "Initial preview release" for 10.0.0-preview.1 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com> --- .../CHANGELOG.md | 17 +---------------- .../CHANGELOG.md | 4 +--- .../CHANGELOG.md | 3 +-- .../CHANGELOG.md | 18 +----------------- 4 files changed, 4 insertions(+), 38 deletions(-) diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md index 41b0a8c60b9..7cf926bb99a 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md @@ -2,19 +2,4 @@ ## 10.0.0-preview.1 -- Initial preview release of Microsoft.Extensions.DataIngestion.Abstractions -- Introduced `IngestionDocument` class for representing format-agnostic document containers -- Introduced `IngestionDocumentElement` abstract base class for document elements -- Introduced document element types: - - `IngestionDocumentSection` - Represents a section or page in a document - - `IngestionDocumentParagraph` - Represents a paragraph - - `IngestionDocumentHeader` - Represents a header with optional level - - `IngestionDocumentFooter` - Represents a footer - - `IngestionDocumentTable` - Represents a table with 2D cell array - - `IngestionDocumentImage` - Represents an image with optional binary content and alternative text -- Introduced `IngestionChunk` class for representing content chunks -- Introduced `IngestionChunker` abstract base class for splitting documents into chunks -- Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents -- Introduced `IngestionDocumentProcessor` abstract base class for processing documents -- Introduced `IngestionChunkProcessor` abstract base class for processing chunks -- Introduced `IngestionChunkWriter` abstract base class for writing chunks to storage +- Initial preview release diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md index af952a14a33..7cf926bb99a 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md @@ -2,6 +2,4 @@ ## 10.0.0-preview.1 -- Initial preview release of Microsoft.Extensions.DataIngestion.MarkItDown -- Introduced `MarkItDownReader` class for converting documents to markdown using the MarkItDown CLI -- Introduced `MarkItDownMcpReader` class for converting documents using MarkItDown Model Context Protocol (MCP) server +- Initial preview release diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md index 243cf45b1c1..7cf926bb99a 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md @@ -2,5 +2,4 @@ ## 10.0.0-preview.1 -- Initial preview release of Microsoft.Extensions.DataIngestion.Markdig -- Introduced `MarkdownReader` class for reading markdown documents and converting them to `IngestionDocument` +- Initial preview release diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md index b79a4180eca..a88260e298f 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md +++ b/src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md @@ -6,20 +6,4 @@ ## 10.0.0-preview.1 -- Initial preview release of Microsoft.Extensions.DataIngestion -- Introduced `IngestionPipeline` class for orchestrating document ingestion workflows -- Introduced `IngestionPipelineOptions` class for configuring pipeline behavior -- Introduced `IngestionResult` class for representing ingestion operation results -- Introduced chunker implementations: - - `HeaderChunker` - Splits documents based on headers and their levels - - `SemanticSimilarityChunker` - Splits documents based on semantic similarity using embeddings -- Introduced `IngestionChunkerOptions` class for configuring chunker behavior (token limits, overlap, etc.) -- Introduced document processors/enrichers: - - `ClassificationEnricher` - Enriches document metadata with classifications - - `KeywordEnricher` - Enriches document metadata with keywords - - `SentimentEnricher` - Enriches document metadata with sentiment analysis - - `SummaryEnricher` - Enriches document metadata with summaries - - `ImageAlternativeTextEnricher` - Enriches images with alternative text descriptions -- Introduced `EnricherOptions` class for configuring enricher behavior -- Introduced `VectorStoreWriter` class for writing chunks to vector stores -- Introduced `VectorStoreWriterOptions` class for configuring vector store writing behavior +- Initial preview release