Skip to content

Conversation

@MO2k4
Copy link
Contributor

@MO2k4 MO2k4 commented Feb 11, 2026

Summary

  • Vector search: Add TopKHeap (min-heap) for O(N log K) brute-force search instead of full array sort; add vectorCache for in-memory vector storage; add queryEmbeddingCache LRU (max 50) in VectorManager
  • Graph traversal: Add sortEdgesByEvidence() with EDGE_KIND_PRIORITY for deterministic, meaningful edge ordering; cap maxDepth at 20; add FIND_PATH_MAX_VISITED=10000 limit; rewrite findPath to parent-pointer BFS; use separate visitedAncestors/visitedDescendants in type hierarchy
  • Graph queries: Use picomatch for safe glob matching in findByQualifiedName; batch-fetch nodes via getNodesByIds in getNodeContext/getFileDependencies/getFileDependents; optimize cycle detection with depsCache and pathSet
  • DB queries: Re-add getNodesByIds and getNodesByKinds (now have callers in graph/queries.ts)
  • Replace console.log/console.warn with logDebug/logWarn in vector search

Details

TopKHeap

The brute-force vector search previously sorted the entire results array to find top-K matches. TopKHeap maintains a min-heap of size K, yielding O(N log K) instead of O(N log N). For typical searches (K=10, N=10000+), this is a significant improvement.

Evidence-based edge sorting

Edges are now sorted by an evidence priority that reflects how informative each edge kind is for code understanding:

calls: 10, extends: 9, implements: 8, imports: 7, exports: 6,
type_of: 5, returns: 4, instantiates: 3, overrides: 3,
decorates: 2, references: 1, contains: 0

Graph traversal hardening

  • maxDepth capped at 20 (was Infinity) to prevent unbounded traversal
  • findPath rewritten from recursive DFS to iterative BFS with parent-pointer backtracking and a 10,000-node visit limit
  • Type hierarchy uses separate visited sets for ancestors vs descendants to avoid incorrectly skipping nodes

Batch node fetching

getNodesByIds fetches nodes in chunks of 999 (SQLite parameter limit) with a Map return for O(1) lookup. Used in 4 call sites in graph/queries.ts. getNodesByKinds uses an IN clause for filtered queries, used in getProjectGraph.

Test plan

  • npm run build compiles without errors
  • npm test passes with same baseline (28 pre-existing failures, 326 passing)
  • 5 files changed: src/db/queries.ts, src/graph/queries.ts, src/graph/traversal.ts, src/vectors/manager.ts, src/vectors/search.ts
  • 1 new export (sortEdgesByEvidence) — has 2 callers in traversal.ts
  • getNodesByIds — 4 callers in graph/queries.ts
  • getNodesByKinds — 1 caller in graph/queries.ts
  • Sentry preserved (2 captureException in db/queries.ts, same as upstream)
  • No upstream features removed

- Add TopKHeap for O(N log K) brute-force vector search
- Add Float32Array vectorCache to avoid repeated Buffer conversions
- Add query embedding LRU cache (max 50 entries)
- Add SQL-level nodeKinds filtering in brute-force search
- Replace console.log/warn with logDebug/logWarn
- Set maxDepth=20 default, add FIND_PATH_MAX_VISITED=10000
- Add evidence-based edge sorting (sortEdgesByEvidence)
- Rewrite findPath to parent-pointer BFS with visit limit
- Use separate visitedAncestors/visitedDescendants in type hierarchy
- Add picomatch for safe glob matching in findByQualifiedName
- Batch getNodesByIds in getNodeContext/getFileDependencies/getFileDependents
- Use getNodesByKinds in findByQualifiedName for single-query batch fetch
- Optimize cycle detection with depsCache and pathSet
- Re-add getNodesByIds and getNodesByKinds (now have callers)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant