Knowledge graph
Mushi maintains a per-project knowledge graph of reports, components, fixes, and the developers who touched them.
Storage
| Backend | Role | Status |
|---|---|---|
| pgvector | Embedding-based dedup & similarity | Primary, always-on |
| Apache AGE | True graph queries (Cypher, paths) | Parallel-write (Phase 1) |
blast_radius_cache MV | Per-component blast radius | Refreshed by pg_cron |
intelligence_benchmarks_mv | Cross-customer (k≥5) benchmark | Refreshed nightly |
Why both pgvector and AGE?
pgvector covers the 95% case (semantic dedup, “show me reports like this
one”) with no operational extra cost. AGE handles the 5% that requires true
graph traversal — “every report on <Checkout/> from the last release that
also touched the same Stripe webhook.” AGE is opt-in via the
graph_backend project setting; the parallel write keeps both in sync and
an age_drift_audit table snapshots any divergence so we can repair it.
Indexing
Embeddings use HNSW (built when a project crosses 50k reports; IVFFlat
fallback below). All RLS-referenced columns (especially user_id) are
explicitly indexed because RLS subqueries that match on un-indexed columns
can be 100× slower.
Bug Ontology
Tags are drawn from the cross-customer Bug Ontology (see Whitepaper §2.6 / Appendix A). Each tag is a versioned, reviewed enum that lets us surface “this looks like the same Stripe-3DS issue 11 other customers reported last month” — without exposing the other customers’ data.