From Fragmented to Fluid: Simplifying ServiceRadar with Elixir, Rust, and CloudNativePG
In observability, complexity is the enemy. Our previous architecture asked a React app to hit multiple APIs through Nginx/Kong for JWT verification, while a standalone auth service issued JWKS/OAuth tokens. We'd been wrestling with this stack for a while—slow initial renders, state management sprawl, and the constant churn of keeping dependencies current across a deep node_modules tree.
React2Shell forced the conversation we'd been putting off. The vulnerability itself was bad enough, but the follow-up CVEs and the broader pattern they revealed made us take a harder look at what we were signing up for. React is a mature framework carrying years of accumulated complexity and technical debt. That's not a criticism—it's the natural arc of any widely-adopted JS project. But for a team shipping observability tooling, betting on a stack where the next critical CVE feels like a matter of "when" rather than "if" wasn't a trade-off we wanted to keep making.
The platform takes a different shape:
- Phoenix + LiveView serves as the experience layer
- SRQL (Rust) powers query translation and execution
- CloudNativePG with TimescaleDB + Apache AGE provides a single unified data store
- core-elx orchestrates ingestion, identity reconciliation (DIRE), and control-plane workflows
What's Changing
Phoenix LiveView Replaces React
The UI is built on Phoenix LiveView with authenticated sessions throughout. Previously, the React app split traffic across multiple backend APIs and traversed gateways for JWT validation. Now SRQL queries execute server-side with Web-NG coordinating auth and session state.
Beyond the security posture, LiveView solves the performance issues we'd been chasing. Server-rendered HTML over WebSockets eliminates the hydration delays and client-side state bloat that plagued our dashboards. The BEAM's lightweight processes handle thousands of concurrent connections without the careful optimization React demanded.
SRQL Powers Queries
SRQL continues to provide a Rust-powered query engine that translates the ServiceRadar query language into SQL for CNPG/Timescale. Web-NG routes /api/query and /api/stream to the SRQL service while keeping auth and session context centralized.
Identity and Access: Built-In, Not Bolted-On
Kong previously fronted JWT validation while a custom service issued JWKS and OAuth tokens. Phoenix now owns identity end-to-end: Guardian issues and validates JWTs, sessions carry scope, and RBAC is enforced in the web layer. Fewer moving parts, clearer boundaries, and no separate JWKS gateway to maintain.
Everything on CloudNativePG
The UI connects directly to CloudNativePG, retiring separate analytics stores. We considered a dedicated graph database and a separate engine for document search, but unifying on Postgres keeps operations simple.
Timescale hypertables back metrics. Apache AGE powers graph queries for topology and dependency mapping. With Timescale's upcoming pg_textsearch extension (BM25 + hybrid retrieval), we can add document search without introducing another database.
CNPG provides effortless HA and clustering in Kubernetes through operator-managed primaries, replicas, failover, and backups. The same image and migrations run in Docker Compose or as a standalone Postgres install, so development and production deployments share one artifact.
One cluster now serves relational inventory, RBAC, metrics, topology, and future search—keeping schema changes and migrations in one place.
core-elx: Control Plane, Not Edge Plane
core-elx coordinates agent ingestion, identity reconciliation, and control-plane APIs. Edge collectors (syslog, SNMP, netflow, etc.) publish to NATS JetStream, while agent-gateway streams status and collection payloads over gRPC. Phoenix reads from the authoritative CNPG store; it does not sit in the ingestion path.
Why This Matters
Fewer moving parts. No gateway hop for queries, no extra auth service to deploy, fewer TLS certificates to manage.
Lower latency. Queries translate and execute without extra network round-trips.
Operational clarity. One database with Timescale + AGE means consistent backups, HA, and observability of the observability stack.
Smaller attack surface. Erlang/OTP has decades of battle-testing in telecom environments. The dependency tree is shallow, and critical vulnerabilities are rare. We're no longer tracking npm advisories weekly.
Defense in depth. Identity lives in the application layer where it belongs. Sessions, scopes, and RBAC flow through the same runtime that serves the UI—no external gateway required to enforce access control.
Simpler IAM. Guardian-driven JWTs and Phoenix plugs replace Kong + custom JWKS/OAuth, keeping auth in one place rather than spread across edge proxies.
Streaming Without a Separate Engine
Elixir and Phoenix already provide the streaming primitives we need:
Log tailing. Postgres LISTEN/NOTIFY plus lightweight watchers broadcast to the UI. Payloads stay inside the runtime.
Metrics windows. Timescale continuous aggregates keep dashboards fresh without heavy queries.
Headroom. If payloads outgrow NOTIFY, logical replication slots let us stream WAL changes directly into application processes.
Real-Time Streaming with pg_notify
- Triggers emit events. Inserts on hot tables (
logs,metrics) raisepg_notifypayloads. - Notification workers. GenServers using
Postgrex.Notificationsfan events into Phoenix PubSub topics; LiveViews subscribe and stream-insert rows. - Backpressure-aware. Large payloads fall back to ID-only messages plus Ecto hydration. For extreme volume, logical replication drops in without changing the LiveView consumers.
API Shift to Elixir
- Single surface. The SRQL and domain APIs that React once hit through Kong now live in Phoenix; no dual-API/gateway dance.
- MCP in-process.
hermes_mcpexposes the MCP contract from Elixir so IDE/CLI agents talk directly to the web app. - Contract-first docs.
open_api_spexgenerates OpenAPI from Phoenix controllers, replacing hand-rolled swagger and keeping the contract locked to code.
Core Becomes the Engine, Not the Edge
core-elx remains the orchestration and ingestion engine writing into CNPG. Web/API concerns live in Phoenix, keeping the experience layer and control plane tightly aligned.
Why the BEAM Fits
- GenServers as building blocks. Notification listeners, SRQL executors, MCP endpoints, and LiveViews are supervised, lightweight, and restartable.
- Massive fan-in/out. BEAM schedulers handle thousands of WebSocket/API clients without head-of-line blocking.
- Fault tolerance. Supervisors contain failures; a bad notification restarts a worker instead of taking down the UI.
- Operational ergonomics. Telemetry hooks and OTP releases make tuning, blue/green, and hot upgrades straightforward.
What's Included
- Phoenix LiveView UI (
web-ng) - SRQL service for
/api/queryand/api/stream - Direct CNPG access with TimescaleDB + Apache AGE (HA/replicas via CNPG operator)
pg_notify→ LiveView pushes for real-time streaming- MCP endpoints in Elixir (hermes_mcp)
- OpenAPI generation via
open_api_spex - core-elx ingestion into the same CNPG cluster
Takeaways
We consolidated UI, query planning, and data access into a single Phoenix application while keeping core-elx focused on orchestration and ingestion. TimescaleDB and Apache AGE live under one CloudNativePG roof, and SRQL remains the query engine for analytics workloads. The result is a smaller blast radius, faster queries, simpler IAM, and a streaming model that rides on Postgres + BEAM instead of an external gateway stack.