Skip to content

Clarify WriteAsync contract: move document from IngestionChunk to WriteAsync parameter#7433

Open
Copilot wants to merge 2 commits intodata-ingestion-preview2from
copilot/medi-clarify-ingestionchunkwriter-contract
Open

Clarify WriteAsync contract: move document from IngestionChunk to WriteAsync parameter#7433
Copilot wants to merge 2 commits intodata-ingestion-preview2from
copilot/medi-clarify-ingestionchunkwriter-contract

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 28, 2026

WriteAsync implicitly assumed all chunks belong to a single document (pre-existing keys only fetched once for the first chunk's document). This made the contract ambiguous and error-prone. Make it explicit by moving the document association from per-chunk to the method signature.

API changes

  • IngestionChunk<T>: Remove Document property and constructor parameter
  • IngestionChunkWriter<T>.WriteAsync: Add IngestionDocument document parameter
// Before
public abstract Task WriteAsync(IAsyncEnumerable<IngestionChunk<T>> chunks, CancellationToken cancellationToken = default);

// After
public abstract Task WriteAsync(IAsyncEnumerable<IngestionChunk<T>> chunks, IngestionDocument document, CancellationToken cancellationToken = default);

Internal cleanup

  • VectorStoreWriter uses the explicit document param instead of chunk.Document
  • IngestionPipeline.IngestAsync passes its document to WriteAsync
  • Chunkers (ElementsChunker, HeaderChunker, SectionChunker, SemanticSimilarityChunker, DocumentTokenChunker) drop the now-unnecessary document parameter from internal helpers and chunk construction

fixes #6970

…to WriteAsync

- Remove Document property and constructor parameter from IngestionChunk<T>
- Add IngestionDocument document parameter to IngestionChunkWriter<T>.WriteAsync
- Update VectorStoreWriter to use the new document parameter
- Update IngestionPipeline to pass document to WriteAsync
- Update all chunkers (DocumentTokenChunker, ElementsChunker, HeaderChunker,
  SectionChunker, SemanticSimilarityChunker) to not pass document to chunks
- Update all tests to match the new API

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/d041591e-b70e-45f7-9302-c04e4787e92e

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI changed the title [WIP] Clarify the IngestionChunkWriter.WriteAsync contract around documents Clarify WriteAsync contract: move document from IngestionChunk to WriteAsync parameter Mar 28, 2026
Copilot AI requested a review from adamsitnik March 28, 2026 10:29
Copy link
Copy Markdown
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as bad as I thought it would be.

@roji could you PTAL and let me know if it addresses your feedback from #6970?

@adamsitnik adamsitnik requested a review from roji March 28, 2026 10:59
@adamsitnik adamsitnik marked this pull request as ready for review March 28, 2026 11:03
@adamsitnik adamsitnik added this to the Data Ingestion Preview 2 milestone Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MEDI] Clarify the IngestionChunkWriter.WriteAsync contract around documents

2 participants