feat(indexer): introduce incremental indexing engine pipeline by Chethan-Regala · Pull Request #21 · AOSSIE-Org/SmartNotes

Chethan-Regala · 2026-03-08T15:24:26Z

Introduce Incremental Indexing Engine Pipeline

This PR introduces the foundational architecture for the Smart Notes incremental indexing engine, which will power future semantic search and AI-assisted features.

Motivation

Smart Notes aims to support offline-first AI-powered knowledge retrieval.
To enable scalable semantic search across large vaults, we need an indexing pipeline that can:

process notes incrementally
generate semantic embeddings
maintain deterministic chunk identities
support pluggable storage and embedding models

This PR introduces the core indexing pipeline architecture that enables these capabilities.

Architecture Overview

The indexing engine is designed around a modular pipeline:

Components Introduced

IndexingEngine

Coordinates the full indexing pipeline.

Responsibilities:

scheduling update/delete jobs
reading notes from the vault
chunking markdown content
generating embeddings
delegating persistence to the storage layer

NoteChunker

Splits markdown notes into deterministic paragraph-based chunks.

Features:

stable chunk hashing
deterministic chunk IDs
predictable ordering

IndexQueue

A lightweight sequential job queue that guarantees ordered indexing operations and prevents concurrency issues during updates.

Adapter Interfaces

To keep the indexing system extensible, the following adapter contracts were introduced:

VaultAdapter – abstraction over vault storage
EmbeddingAdapter – abstraction over embedding models
IndexStore – abstraction over index persistence

This allows the engine to integrate with different implementations such as:

filesystem vaults
SQLite metadata stores
vector databases
local embedding models (e.g. MiniLM / Ollama)

Demo Harness

A minimal demo runner is included to illustrate the pipeline:

src/demo/DemoRunner.ts

This demonstrates the indexing flow using in-memory adapters.

Scope of This PR

This PR intentionally focuses on architecture and pipeline design, not full storage or embedding implementations.

Future PRs will extend this work with:

SQLite-backed index storage
incremental filesystem watchers
embedding model integration
semantic search retrieval

Why This Matters

This indexing pipeline forms the foundation for the AI layer of Smart Notes, enabling:

scalable semantic search
efficient incremental updates
offline-first AI knowledge retrieval

I would appreciate feedback on the architecture and adapter boundaries before expanding this into the full indexing subsystem.

Summary by CodeRabbit

New Features
- Added a new indexer app that chunks notes into semantic paragraphs, generates embeddings, and indexes content.
- Added an asynchronous sequential job queue to schedule and process index update and delete jobs reliably.
- Added demo runner with in-memory mock components to exercise the indexing flow.
Chores
- Added package manifest, TypeScript project config, and build/test scripts for the indexer.

coderabbitai · 2026-03-08T15:24:40Z

Walkthrough

Adds a new indexer app implementing an incremental markdown indexing pipeline: types, adapter interfaces (vault, embedding, store), core components (queue, chunker, engine), a demo runner, package/tsconfig, and gitignore updates for node artifacts.

Changes

Cohort / File(s)	Summary
Repository ignores & app manifest `/.gitignore`, `apps/indexer/.gitignore`, `apps/indexer/package.json`, `apps/indexer/tsconfig.json`	Added node/npm ignore rules and app-specific ignores (`node_modules`, `dist`); created package.json with TypeScript dev deps and scripts; added tsconfig for the indexer app.
Public types `apps/indexer/src/types.ts`	Introduced `NoteChunk`, `IndexJob`, and `IndexResult` type definitions used across the indexer.
Adapter interfaces `apps/indexer/src/adapters/VaultAdapter.ts`, `apps/indexer/src/adapters/EmbeddingAdapter.ts`, `apps/indexer/src/adapters/IndexStore.ts`	Added interfaces to abstract note I/O, embedding generation, and storage operations (readNote, listNotes, embed, saveChunks, deleteNote).
Core logic `apps/indexer/src/IndexQueue.ts`, `apps/indexer/src/NoteChunker.ts`, `apps/indexer/src/IndexingEngine.ts`	Added IndexQueue for sequential job processing, NoteChunker for paragraph-based chunking with SHA-1 ids, and IndexingEngine to orchestrate read → chunk → embed → store flows and schedule update/delete jobs.
Demo & exports `apps/indexer/src/demo/DemoRunner.ts`, `apps/indexer/src/index.ts`	Added an in-memory demo implementation (vault, embedder, store) and runner; created a barrel export re-exporting types, engine, queue, chunker, and adapter interfaces.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant IndexingEngine
    participant IndexQueue
    participant VaultAdapter
    participant NoteChunker
    participant EmbeddingAdapter
    participant IndexStore

    Client->>IndexingEngine: scheduleUpdate("demo.md")
    IndexingEngine->>IndexQueue: enqueue({type: "update", notePath})
    IndexingEngine->>IndexQueue: process(handler)
    IndexQueue->>IndexingEngine: handler(job)
    IndexingEngine->>VaultAdapter: readNote(notePath)
    VaultAdapter-->>IndexingEngine: markdown
    IndexingEngine->>NoteChunker: split(notePath, markdown)
    NoteChunker-->>IndexingEngine: NoteChunk[]
    IndexingEngine->>EmbeddingAdapter: embed(chunk texts)
    EmbeddingAdapter-->>IndexingEngine: embeddings
    IndexingEngine->>IndexStore: saveChunks(notePath, chunks, embeddings)
    IndexStore-->>IndexingEngine: void
    IndexingEngine-->>Client: indexing complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Suggested labels

Typescript Lang

Poem

🐇 I nibble lines and split them neat,
Hash each chunk with tiny feet.
Queues hum softly, embeddings sing,
A demo world where indexes spring. 🥕

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change—introducing the incremental indexing engine pipeline—which aligns with the PR's core objective of establishing the foundational indexing architecture.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Chethan-Regala · 2026-03-08T15:25:54Z

Thanks for reviewing this PR.

This change introduces the foundational architecture for the incremental indexing pipeline that will later support semantic search and AI-powered retrieval in Smart Notes.

The goal of this PR is to establish a clean, modular indexing pipeline before integrating heavier components like:

filesystem watchers
SQLite metadata storage
embedding model integration
hybrid retrieval

The current implementation focuses on defining clear adapter boundaries (VaultAdapter, EmbeddingAdapter, IndexStore) so the indexing engine remains decoupled from specific implementations.

I would especially appreciate feedback on:

the adapter boundaries
the indexing pipeline structure
whether this aligns with the intended future architecture of Smart Notes

Happy to refine the design based on maintainer suggestions.

coderabbitai

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/indexer/package.json`:
- Line 5: The package.json "main" field currently points to "index.js" but
compiled TypeScript outputs to the dist/ directory; update the "main" value in
package.json to point to the compiled entry (e.g., "dist/index.js" or the actual
compiled filename) so imports resolve correctly when consuming this package, and
verify the tsconfig.json/outDir and build outputs match the new main value.
- Around line 6-7: Add a "build" script to the package.json "scripts" section
(next to the existing "test" script) that runs the TypeScript compiler using the
repository's tsconfig (e.g., invoke tsc with -p tsconfig.json) so the project
can be compiled before publishing or running; update any CI or npm lifecycle
hooks to call "npm run build" where needed. Reference symbols: "scripts" object
and existing "test" entry in package.json and the project's tsconfig.json to
ensure the correct compiler configuration is used.

In `@apps/indexer/src/adapters/IndexStore.ts`:
- Around line 8-15: The saveChunks contract on IndexStore currently reads like a
generic save; change it to require an atomic replace of all stored
chunks/embeddings for the given notePath (i.e., replace existing state for
notePath rather than upserting/appending). Update the IndexStore.saveChunks
JSDoc/signature to state “replace all chunks for notePath atomically,” update
implementations of IndexStore (and any concrete classes) to delete/replace the
notePath’s existing chunks in one atomic operation, and ensure IndexingEngine
calls (where it reindexes notes) rely on this replace semantics so removed
chunks are no longer searchable.

In `@apps/indexer/src/IndexingEngine.ts`:
- Around line 33-42: scheduleUpdate (and the similar scheduleDelete) currently
fire-and-forget; change their signatures to return a Promise that resolves when
the work completes by either returning this.queue.drain() for a minimal fix or,
better, having enqueue return a per-job Promise that resolves/rejects from
processJob and then return that Promise from scheduleUpdate/scheduleDelete;
update calls to this.queue.enqueue(job) and
this.queue.process(this.processJob.bind(this)) accordingly (or leave process
setup separate) so scheduleUpdate/scheduleDelete return the queue drain or the
enqueue-provided per-job Promise instead of void.
- Around line 81-89: The embeddings array returned by this.embedder.embed may
not match chunks length, so before constructing the IndexResult and calling
this.store.saveChunks you must validate that embeddings.length === chunks.length
(or throw/return an error); update the IndexingEngine code around
embedder.embed, IndexResult creation, and the call to store.saveChunks to check
the count and fail fast with a clear error if it mismatches, ensuring you do not
persist misaligned chunk/vector pairs.

In `@apps/indexer/src/IndexQueue.ts`:
- Around line 15-18: The process() method currently returns immediately when
this.running is true, giving callers a false completion signal; change it to
keep and return a shared in-flight promise (e.g., this.inFlightPromise) while a
drain is active instead of returning undefined. Specifically, in
IndexQueue.process(handler) create this.inFlightPromise when you set
this.running = true, resolve/reject that promise when the queue drain finishes
(where you currently clear this.running), and when process() is called while
this.running is true simply return the existing this.inFlightPromise so callers
await actual completion; apply the same pattern to the other occurrence
referenced by the review (the second early-return at line 32).
- Around line 25-29: The catch in IndexQueue.ts around await handler(job)
currently just console.error's and swallows failures; change it to propagate the
error instead of resolving successfully so upstream (IndexingEngine) can requeue
or dead-letter the job—specifically, replace the swallow in the try/catch around
handler(job) with logic that either rethrows the caught err (throw err or return
Promise.reject(err)) or invokes the queue/job-level
negative-ack/retry/dead-letter API if one exists; update any tests/consumers
that assume success to handle the propagated failure.

In `@apps/indexer/src/NoteChunker.ts`:
- Line 1: The import in NoteChunker.ts imports a type-only symbol NoteChunk
using a value import; change it to a type-only import (use import type {
NoteChunk } from "./types") so transpilers/linters know it's a type-only
dependency and avoid including it in runtime output; update any references to
NoteChunk in the file as needed but no runtime code changes are required.
- Around line 16-19: The SHA1 input currently concatenates notePath, index, and
text directly in the id calculation (crypto.createHash(...).update(notePath +
index + text)), which can cause ambiguous collisions; change the update call to
join these pieces with a clear delimiter (e.g., '|' or '\0') between notePath,
index, and text when computing id in NoteChunker so each component boundary is
unambiguous.

In `@apps/indexer/tsconfig.json`:
- Around line 1-11: The tsconfig.json sets rootDir: "src" but lacks an include
array, which can cause "file is outside rootDir" errors for files like
vitest.config.ts; update the file to add an "include" array that explicitly
includes "src/**/*" and any top-level TS config/test files (e.g.,
"vitest.config.ts", "*.d.ts") or, alternatively, remove/change rootDir to a
build-only config — modify the tsconfig.json keys rootDir and add include to
cover source and config/test files so tsc no longer errors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cee0a819-7e16-4592-8c37-7efbacc8a821

📥 Commits

Reviewing files that changed from the base of the PR and between a3ccb2b and fc75321.

⛔ Files ignored due to path filters (1)

apps/indexer/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (13)

.gitignore
apps/indexer/.gitignore
apps/indexer/package.json
apps/indexer/src/IndexQueue.ts
apps/indexer/src/IndexingEngine.ts
apps/indexer/src/NoteChunker.ts
apps/indexer/src/adapters/EmbeddingAdapter.ts
apps/indexer/src/adapters/IndexStore.ts
apps/indexer/src/adapters/VaultAdapter.ts
apps/indexer/src/demo/DemoRunner.ts
apps/indexer/src/index.ts
apps/indexer/src/types.ts
apps/indexer/tsconfig.json

apps/indexer/package.json

apps/indexer/src/adapters/IndexStore.ts

apps/indexer/src/IndexingEngine.ts

apps/indexer/src/IndexQueue.ts

apps/indexer/src/NoteChunker.ts

apps/indexer/tsconfig.json

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/indexer/src/adapters/IndexStore.ts`:
- Line 1: The import for NoteChunk is type-only and should use TypeScript's
type-only import syntax; update the import in IndexStore.ts to use "import type
{ NoteChunk } from '../types'" so that the NoteChunk reference in the IndexStore
interface (and any related type annotations) is imported as a type-only import.

In `@apps/indexer/src/IndexingEngine.ts`:
- Around line 1-6: Change the type-only imports to use TypeScript's `import
type` for the interfaces and types: replace the imports for VaultAdapter,
EmbeddingAdapter, IndexStore, IndexResult, and IndexJob in IndexingEngine.ts
with `import type` declarations so only runtime values remain as normal imports;
ensure that NoteChunker and IndexQueue remain regular imports if they are used
at runtime and that no runtime-only imports are accidentally converted.

In `@apps/indexer/src/IndexQueue.ts`:
- Around line 15-30: The process() method on IndexQueue currently lets
exceptions from handler(job) abort the drain; either document this in the method
JSDoc or add resilience: wrap the await handler(job) call in a try/catch, call
an optional failure handler (e.g., this.onJobFailed?.(job, err) or a provided
onFailure callback) with the IndexJob and error, and continue processing
remaining jobs; ensure this.processing is still cleared in finally and keep the
shared-promise behavior intact.
- Line 1: Change the value import to a type-only import: replace the runtime
import of IndexJob with an `import type { IndexJob } from "./types"` import
since `IndexJob` is only used as a type annotation in this module (check usages
in this file to confirm no runtime references), ensuring TypeScript emits no
runtime import for IndexJob.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5deacfc7-7f51-4fa2-a77b-bec162742043

📥 Commits

Reviewing files that changed from the base of the PR and between fc75321 and 487fbfa.

📒 Files selected for processing (6)

apps/indexer/package.json
apps/indexer/src/IndexQueue.ts
apps/indexer/src/IndexingEngine.ts
apps/indexer/src/NoteChunker.ts
apps/indexer/src/adapters/IndexStore.ts
apps/indexer/tsconfig.json

apps/indexer/src/adapters/IndexStore.ts

apps/indexer/src/IndexingEngine.ts

apps/indexer/src/IndexQueue.ts

Chethan-Regala · 2026-03-09T05:28:12Z

I've pushed an update addressing the automated review suggestions.
Please let me know if anything should be refined further.

Chethan Regala added 6 commits March 8, 2026 18:17

feat(indexer): initialize incremental indexing engine module structure

7dbdb48

feat(indexer): add core types and adapter interfaces

147fc27

feat(indexer): implement note chunking and indexing queue

e4e80f4

feat(indexer): implement core indexing engine pipeline

eb7514d

feat(indexer): add storage adapter and demo indexing runner

4ad0274

chore(indexer): ignore build artifacts and untrack dist output

fc75321

github-actions bot added no-issue-linked configuration dependencies javascript size/L first-time-contributor labels Mar 8, 2026

github-actions bot added size/L and removed size/L labels Mar 8, 2026

coderabbitai bot requested changes Mar 8, 2026

View reviewed changes

refactor(indexer): address automated review feedback

487fbfa

github-actions bot added size/L and removed size/L labels Mar 9, 2026

coderabbitai bot requested changes Mar 9, 2026

View reviewed changes

apps/indexer/src/adapters/IndexStore.ts Show resolved Hide resolved

apps/indexer/src/IndexingEngine.ts Show resolved Hide resolved

apps/indexer/src/IndexQueue.ts Show resolved Hide resolved

apps/indexer/src/IndexQueue.ts Show resolved Hide resolved

coderabbitai bot approved these changes Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(indexer): introduce incremental indexing engine pipeline#21

feat(indexer): introduce incremental indexing engine pipeline#21
Chethan-Regala wants to merge 7 commits intoAOSSIE-Org:mainfrom
Chethan-Regala:feat/incremental-indexing-engine

Chethan-Regala commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

Chethan-Regala commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Chethan-Regala commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chethan-Regala commented Mar 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduce Incremental Indexing Engine Pipeline

Motivation

Architecture Overview

Components Introduced

Demo Harness

Scope of This PR

Why This Matters

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

Uh oh!

Chethan-Regala commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Chethan-Regala commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Chethan-Regala commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading