generated from AOSSIE-Org/Template-Repo
-
-
Notifications
You must be signed in to change notification settings - Fork 20
feat(indexer): introduce incremental indexing engine pipeline #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Chethan-Regala
wants to merge
7
commits into
AOSSIE-Org:main
Choose a base branch
from
Chethan-Regala:feat/incremental-indexing-engine
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
7dbdb48
feat(indexer): initialize incremental indexing engine module structure
147fc27
feat(indexer): add core types and adapter interfaces
e4e80f4
feat(indexer): implement note chunking and indexing queue
eb7514d
feat(indexer): implement core indexing engine pipeline
4ad0274
feat(indexer): add storage adapter and demo indexing runner
fc75321
chore(indexer): ignore build artifacts and untrack dist output
487fbfa
refactor(indexer): address automated review feedback
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,6 @@ | ||
| # Node.js | ||
| node_modules/ | ||
|
|
||
| ## Core latex/pdflatex auxiliary files: | ||
| *.aux | ||
| *.lof | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| dist | ||
| node_modules |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| { | ||
| "name": "indexer", | ||
| "version": "1.0.0", | ||
| "description": "", | ||
| "main": "dist/index.js", | ||
| "scripts": { | ||
| "build": "tsc", | ||
| "test": "echo \"Error: no test specified\" && exit 1" | ||
| }, | ||
| "keywords": [], | ||
| "author": "", | ||
| "license": "ISC", | ||
| "type": "commonjs", | ||
| "devDependencies": { | ||
| "@types/node": "^25.3.5", | ||
| "typescript": "^5.9.3" | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| import { IndexJob } from "./types" | ||
Chethan-Regala marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Simple sequential job queue for indexing tasks. | ||
| * Ensures indexing operations run in order. | ||
| */ | ||
| export class IndexQueue { | ||
| private queue: IndexJob[] = [] | ||
| private processing: Promise<void> | null = null | ||
|
|
||
| enqueue(job: IndexJob) { | ||
| this.queue.push(job) | ||
| } | ||
|
|
||
| process(handler: (job: IndexJob) => Promise<void>): Promise<void> { | ||
| if (this.processing) return this.processing | ||
|
|
||
| this.processing = (async () => { | ||
| while (this.queue.length > 0) { | ||
| const job = this.queue.shift() | ||
| if (!job) continue | ||
|
|
||
| await handler(job) | ||
| } | ||
| })().finally(() => { | ||
| this.processing = null | ||
| }) | ||
|
|
||
| return this.processing | ||
| } | ||
Chethan-Regala marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| import { VaultAdapter } from "./adapters/VaultAdapter" | ||
| import { EmbeddingAdapter } from "./adapters/EmbeddingAdapter" | ||
| import { IndexStore } from "./adapters/IndexStore" | ||
| import { NoteChunker } from "./NoteChunker" | ||
| import { IndexQueue } from "./IndexQueue" | ||
| import { IndexResult, IndexJob } from "./types" | ||
Chethan-Regala marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Coordinates the incremental indexing pipeline. | ||
| */ | ||
| export class IndexingEngine { | ||
| private vault: VaultAdapter | ||
| private embedder: EmbeddingAdapter | ||
| private store: IndexStore | ||
| private chunker: NoteChunker | ||
| private queue: IndexQueue | ||
|
|
||
| constructor( | ||
| vault: VaultAdapter, | ||
| embedder: EmbeddingAdapter, | ||
| store: IndexStore | ||
| ) { | ||
| this.vault = vault | ||
| this.embedder = embedder | ||
| this.store = store | ||
| this.chunker = new NoteChunker() | ||
| this.queue = new IndexQueue() | ||
| } | ||
|
|
||
| /** | ||
| * Schedule indexing for a note. | ||
| */ | ||
| scheduleUpdate(notePath: string): Promise<void> { | ||
| const job: IndexJob = { | ||
| type: "update", | ||
| notePath, | ||
| } | ||
|
|
||
| this.queue.enqueue(job) | ||
|
|
||
| return this.queue.process(this.processJob.bind(this)) | ||
| } | ||
|
|
||
| /** | ||
| * Schedule deletion of a note from the index. | ||
| */ | ||
| scheduleDelete(notePath: string): Promise<void> { | ||
| const job: IndexJob = { | ||
| type: "delete", | ||
| notePath, | ||
| } | ||
|
|
||
| this.queue.enqueue(job) | ||
|
|
||
| return this.queue.process(this.processJob.bind(this)) | ||
| } | ||
|
|
||
| /** | ||
| * Process jobs coming from the queue. | ||
| */ | ||
| private async processJob(job: IndexJob) { | ||
| if (job.type === "update") { | ||
| await this.indexNote(job.notePath) | ||
| } | ||
|
|
||
| if (job.type === "delete") { | ||
| await this.removeNote(job.notePath) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Full indexing pipeline for a note. | ||
| */ | ||
| private async indexNote(notePath: string): Promise<IndexResult> { | ||
| const markdown = await this.vault.readNote(notePath) | ||
|
|
||
| const chunks = this.chunker.split(notePath, markdown) | ||
|
|
||
| const chunkTexts = chunks.map((c) => c.text) | ||
|
|
||
| const embeddings = await this.embedder.embed(chunkTexts) | ||
|
|
||
| if (embeddings.length !== chunks.length) { | ||
| throw new Error( | ||
| `Embedding adapter returned ${embeddings.length} embeddings for ${chunks.length} chunks` | ||
| ) | ||
| } | ||
|
|
||
| const result: IndexResult = { | ||
| notePath, | ||
| chunks, | ||
| embeddings, | ||
| } | ||
|
|
||
| await this.store.saveChunks(notePath, chunks, embeddings) | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| return result | ||
| } | ||
|
|
||
| /** | ||
| * Remove a note from the index. | ||
| */ | ||
| private async removeNote(notePath: string) { | ||
| await this.store.deleteNote(notePath) | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| import type { NoteChunk } from "./types" | ||
| import crypto from "crypto" | ||
|
|
||
| /** | ||
| * Splits markdown notes into chunks. | ||
| * Current implementation is simple paragraph-based splitting. | ||
| */ | ||
| export class NoteChunker { | ||
| split(notePath: string, markdown: string): NoteChunk[] { | ||
| const paragraphs = markdown | ||
| .split(/\n\s*\n/) | ||
| .map((p) => p.trim()) | ||
| .filter((p) => p.length > 0) | ||
|
|
||
| const chunks: NoteChunk[] = paragraphs.map((text, index) => { | ||
| const id = crypto | ||
| .createHash("sha1") | ||
| .update(`${notePath}\0${index}\0${text}`) | ||
| .digest("hex") | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| return { | ||
| id, | ||
| notePath, | ||
| text, | ||
| position: index, | ||
| } | ||
| }) | ||
|
|
||
| return chunks | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| /** | ||
| * Adapter interface for embedding generation. | ||
| * Allows plugging different embedding models. | ||
| */ | ||
| export interface EmbeddingAdapter { | ||
| /** | ||
| * Generate embeddings for chunks. | ||
| */ | ||
| embed(chunks: string[]): Promise<number[][]> | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| import { NoteChunk } from "../types" | ||
Chethan-Regala marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Storage abstraction for indexed notes. | ||
| * Allows plugging SQLite / vector DB / other stores. | ||
| */ | ||
| export interface IndexStore { | ||
| /** | ||
| * Atomically replace all indexed chunks and embeddings for the given notePath. | ||
| * | ||
| * Implementations must remove any previously stored chunks that no longer | ||
| * exist after a note edit. | ||
| */ | ||
| saveChunks( | ||
| notePath: string, | ||
| chunks: NoteChunk[], | ||
| embeddings: number[][] | ||
| ): Promise<void> | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Remove all chunks belonging to a note. | ||
| */ | ||
| deleteNote(notePath: string): Promise<void> | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| /** | ||
| * Adapter interface for reading notes from the vault. | ||
| * This keeps the indexing engine independent of the | ||
| * underlying filesystem implementation. | ||
| */ | ||
| export interface VaultAdapter { | ||
| /** | ||
| * Read the contents of a note. | ||
| */ | ||
| readNote(notePath: string): Promise<string> | ||
|
|
||
| /** | ||
| * List all notes in the vault. | ||
| */ | ||
| listNotes(): Promise<string[]> | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| import { IndexingEngine } from "../IndexingEngine" | ||
| import { VaultAdapter } from "../adapters/VaultAdapter" | ||
| import { EmbeddingAdapter } from "../adapters/EmbeddingAdapter" | ||
| import { IndexStore } from "../adapters/IndexStore" | ||
| import { NoteChunk } from "../types" | ||
|
|
||
| /** | ||
| * Simple in-memory demo implementations | ||
| */ | ||
|
|
||
| class DemoVault implements VaultAdapter { | ||
| async readNote(notePath: string): Promise<string> { | ||
| return ` | ||
| # Example Note | ||
|
|
||
| This is the first paragraph. | ||
|
|
||
| This is another paragraph about Smart Notes. | ||
| ` | ||
| } | ||
|
|
||
| async listNotes(): Promise<string[]> { | ||
| return ["demo.md"] | ||
| } | ||
| } | ||
|
|
||
| class DemoEmbedder implements EmbeddingAdapter { | ||
| async embed(chunks: string[]): Promise<number[][]> { | ||
| return chunks.map(() => [Math.random(), Math.random(), Math.random()]) | ||
| } | ||
| } | ||
|
|
||
| class DemoStore implements IndexStore { | ||
| async saveChunks( | ||
| notePath: string, | ||
| chunks: NoteChunk[], | ||
| embeddings: number[][] | ||
| ): Promise<void> { | ||
| console.log("Indexed note:", notePath) | ||
| console.log("Chunks:", chunks.length) | ||
| console.log("Embeddings:", embeddings.length) | ||
| } | ||
|
|
||
| async deleteNote(notePath: string): Promise<void> { | ||
| console.log("Deleted note:", notePath) | ||
| } | ||
| } | ||
|
|
||
| async function runDemo() { | ||
| const engine = new IndexingEngine( | ||
| new DemoVault(), | ||
| new DemoEmbedder(), | ||
| new DemoStore() | ||
| ) | ||
|
|
||
| engine.scheduleUpdate("demo.md") | ||
| } | ||
|
|
||
| runDemo() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| export * from "./types" | ||
| export * from "./IndexingEngine" | ||
| export * from "./IndexQueue" | ||
| export * from "./NoteChunker" | ||
|
|
||
| export * from "./adapters/VaultAdapter" | ||
| export * from "./adapters/EmbeddingAdapter" | ||
| export * from "./adapters/IndexStore" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| /** | ||
| * Represents a semantic chunk extracted from a note. | ||
| */ | ||
| export type NoteChunk = { | ||
| id: string | ||
| notePath: string | ||
| text: string | ||
| position: number | ||
| } | ||
|
|
||
| /** | ||
| * Job sent to the indexing queue. | ||
| */ | ||
| export type IndexJob = { | ||
| type: "update" | "delete" | ||
| notePath: string | ||
| } | ||
|
|
||
| /** | ||
| * Result produced by the indexing pipeline. | ||
| */ | ||
| export type IndexResult = { | ||
| notePath: string | ||
| chunks: NoteChunk[] | ||
| embeddings: number[][] | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.