Skip to content

Latest commit

 

History

History

README.md

Microsoft.Extensions.DataIngestion

.NET developers need to efficiently process, chunk, and retrieve information from diverse document formats while preserving semantic meaning and structural context. The Microsoft.Extensions.DataIngestion libraries provide a unified approach for representing document ingestion components.

The packages

The Microsoft.Extensions.DataIngestion.Abstractions package provides the core exchange types, including IngestionDocument, IngestionChunker<T>, IngestionChunkProcessor<T>, and IngestionChunkWriter<T>. Any .NET library that provides document processing capabilities can implement these abstractions to enable seamless integration with consuming code.

The Microsoft.Extensions.DataIngestion package has an implicit dependency on the Microsoft.Extensions.DataIngestion.Abstractions package. This package enables you to easily integrate components such as enrichment processors, vector storage writers, and telemetry into your applications using familiar dependency injection and pipeline patterns. For example, it provides the SentimentEnricher, KeywordEnricher, and SummaryEnricher processors that can be chained together in ingestion pipelines.

Which package to reference

Libraries that provide implementations of the abstractions typically reference only Microsoft.Extensions.DataIngestion.Abstractions.

To also have access to higher-level utilities for working with document ingestion components, reference the Microsoft.Extensions.DataIngestion package instead (which itself references Microsoft.Extensions.DataIngestion.Abstractions). Most consuming applications and services should reference the Microsoft.Extensions.DataIngestion package along with one or more libraries that provide concrete implementations of the abstractions, such as Microsoft.Extensions.DataIngestion.MarkItDown or Microsoft.Extensions.DataIngestion.Markdig.

Install the package

From the command-line:

dotnet add package Microsoft.Extensions.DataIngestion --prerelease

Or directly in the C# project file:

<ItemGroup>
  <PackageReference Include="Microsoft.Extensions.DataIngestion" Version="[CURRENTVERSION]" />
</ItemGroup>

Feedback & Contributing

We welcome feedback and contributions in our GitHub repo.