perf: TopKHeap vector search, evidence-based edge sorting, graph traversal hardening #33
+288
−107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
TopKHeap(min-heap) for O(N log K) brute-force search instead of full array sort; addvectorCachefor in-memory vector storage; addqueryEmbeddingCacheLRU (max 50) in VectorManagersortEdgesByEvidence()withEDGE_KIND_PRIORITYfor deterministic, meaningful edge ordering; capmaxDepthat 20; addFIND_PATH_MAX_VISITED=10000limit; rewritefindPathto parent-pointer BFS; use separatevisitedAncestors/visitedDescendantsin type hierarchypicomatchfor safe glob matching infindByQualifiedName; batch-fetch nodes viagetNodesByIdsingetNodeContext/getFileDependencies/getFileDependents; optimize cycle detection withdepsCacheandpathSetgetNodesByIdsandgetNodesByKinds(now have callers in graph/queries.ts)console.log/console.warnwithlogDebug/logWarnin vector searchDetails
TopKHeap
The brute-force vector search previously sorted the entire results array to find top-K matches.
TopKHeapmaintains a min-heap of size K, yielding O(N log K) instead of O(N log N). For typical searches (K=10, N=10000+), this is a significant improvement.Evidence-based edge sorting
Edges are now sorted by an evidence priority that reflects how informative each edge kind is for code understanding:
Graph traversal hardening
maxDepthcapped at 20 (wasInfinity) to prevent unbounded traversalfindPathrewritten from recursive DFS to iterative BFS with parent-pointer backtracking and a 10,000-node visit limitBatch node fetching
getNodesByIdsfetches nodes in chunks of 999 (SQLite parameter limit) with a Map return for O(1) lookup. Used in 4 call sites in graph/queries.ts.getNodesByKindsuses an IN clause for filtered queries, used ingetProjectGraph.Test plan
npm run buildcompiles without errorsnpm testpasses with same baseline (28 pre-existing failures, 326 passing)src/db/queries.ts,src/graph/queries.ts,src/graph/traversal.ts,src/vectors/manager.ts,src/vectors/search.tssortEdgesByEvidence) — has 2 callers in traversal.tsgetNodesByIds— 4 callers in graph/queries.tsgetNodesByKinds— 1 caller in graph/queries.tscaptureExceptionin db/queries.ts, same as upstream)