From Fragmented to Fluid: Simplifying ServiceRadar with Elixir, Rustler, and CloudNativePG

December 16, 2025 · 7 min read

Open Source Software Engineer

In observability, complexity is the enemy. Our previous architecture asked a React app to hit two APIs (core in Go, SRQL our Domain-Specific Language (DSL) and query engine in Rust) through Nginx/Kong for JWT verification, while a Go-based auth service issued JWKS/OAuth tokens. We'd been wrestling with this stack for a while—slow initial renders, state management sprawl, and the constant churn of keeping dependencies current across a deep node_modules tree.

React2Shell forced the conversation we'd been putting off. The vulnerability itself was bad enough, but the follow-up CVEs and the broader pattern they revealed made us take a harder look at what we were signing up for. React is a mature framework carrying years of accumulated complexity and technical debt. That's not a criticism—it's the natural arc of any widely-adopted JS project. But for a team shipping observability tooling, betting on a stack where the next critical CVE feels like a matter of "when" rather than "if" wasn't a trade-off we wanted to keep making.

An upcoming release takes a different shape:

Phoenix + LiveView serves as the experience layer
Rustler-embedded SRQL runs inside the Phoenix app as a NIF—no extra service
CloudNativePG with TimescaleDB + Apache AGE provides a single unified data store
Go core continues to orchestrate agents, pollers, and ingestion

What's Changing

Phoenix LiveView Replaces React

The new UI is built on Phoenix LiveView with authenticated sessions throughout. Previously, the React app split traffic across two backend APIs and traversed multiple gateways for JWT validation. Now SRQL queries execute server-side within Phoenix—no gateway hop, no separate query service, and no dual-API coordination from the browser.

Beyond the security posture, LiveView solves the performance issues we'd been chasing. Server-rendered HTML over WebSockets eliminates the hydration delays and client-side state bloat that plagued our dashboards. The BEAM's lightweight processes handle thousands of concurrent connections without the careful optimization React demanded.

SRQL Moves In-Process via Rustler

SRQL now runs as a Rust NIF embedded directly in the Phoenix application. Query translation happens in-process on dedicated CPU threads, keeping the runtime responsive while eliminating a standalone microservice and removing the API gateway from the query hot path.

Identity and Access: Built-In, Not Bolted-On

Kong previously fronted JWT validation while a custom Go service issued JWKS and OAuth tokens. Phoenix now owns identity end-to-end: Guardian issues and validates JWTs, sessions carry scope, and multi-tenancy and RBAC are first-class concerns in the web layer. Fewer moving parts, clearer boundaries, and no separate JWKS gateway to maintain.

Everything on CloudNativePG

The UI connects directly to CloudNativePG, retiring the previous Timeplus Proton/ClickHouse datastore. We considered a dedicated graph database and a separate engine for document search, but unifying on Postgres keeps operations simple.

Timescale hypertables back metrics. Apache AGE powers graph queries for topology and dependency mapping. With Timescale's upcoming pg_textsearch extension (BM25 + hybrid retrieval), we can add document search without introducing another database.

CNPG provides effortless HA and clustering in Kubernetes through operator-managed primaries, replicas, failover, and backups. The same image and migrations run in Docker Compose or as a standalone Postgres install, so development and production deployments share one artifact.

One cluster now serves relational inventory, RBAC, metrics, topology, and future search—keeping schema changes and migrations in one place.

Go Core: Data Plane, Not Edge Plane

serviceradar-core still coordinates pollers and agents. Edge collectors (syslog, SNMP, netflow, etc.) publish to NATS JetStream—either directly or through leaf nodes on the edge. Horizontally scaled DB-writer consumers (subscription queue groups) pull from JetStream and write to CNPG. The gRPC pipeline stays in place for control/updates that don’t belong on NATS. Phoenix reads from the authoritative CNPG store; it no longer sits in the ingestion path.

Why This Matters

Fewer moving parts. No gateway hop for queries, no extra Rust service to deploy, fewer TLS certificates to manage.

Lower latency. Queries translate and execute in-process without network round-trips.

Operational clarity. One database with Timescale + AGE means consistent backups, HA, and observability of the observability stack.

Smaller attack surface. Erlang/OTP has decades of battle-testing in telecom environments. The dependency tree is shallow, and critical vulnerabilities are rare. We're no longer tracking npm advisories weekly.

Defense in depth. Identity lives in the application layer where it belongs. Sessions, scopes, and RBAC flow through the same runtime that serves the UI—no external gateway required to enforce access control.

Simpler IAM. Guardian-driven JWTs and Phoenix plugs replace Kong + custom JWKS/OAuth, making multi-tenancy a first-class part of the platform rather than an afterthought bolted onto the edge.

Streaming Without a Separate Engine

Elixir and Phoenix already provide the streaming primitives we need:

Log tailing. Postgres LISTEN/NOTIFY plus lightweight watchers broadcast to the UI. Payloads stay inside the runtime.

Metrics windows. Timescale continuous aggregates keep dashboards fresh without heavy queries.

Headroom. If payloads outgrow NOTIFY, logical replication slots let us stream WAL changes directly into application processes.

Real-Time Streaming with `pg_notify`

Triggers emit events. Inserts on hot tables (logs, metrics) raise pg_notify payloads.
Notification workers. GenServers using Postgrex.Notifications fan events into Phoenix PubSub topics; LiveViews subscribe and stream-insert rows.
Backpressure-aware. Large payloads fall back to ID-only messages plus Ecto hydration with per-tenant RBAC. For extreme volume, logical replication drops in without changing the LiveView consumers.

API Shift to Elixir

Single surface. The SRQL and domain APIs that React once hit through Kong now live in Phoenix; no dual-API/gateway dance.
MCP in-process. hermes_mcp exposes the MCP contract from Elixir so IDE/CLI agents talk directly to the web app.
Contract-first docs. open_api_spex generates OpenAPI from Phoenix controllers, replacing the hand-rolled Go swagger layer and keeping the contract locked to code.

Core Becomes the Engine, Not the Edge

serviceradar-core remains the orchestration and ingestion engine (pollers, agents, registry) writing into CNPG. Web/API concerns move to Phoenix, tightening separation of concerns: Go owns data-plane coordination; Elixir owns the experience and API surface.

Why the BEAM Fits

GenServers as building blocks. Notification listeners, SRQL executors, MCP endpoints, and LiveViews are supervised, lightweight, and restartable.
Massive fan-in/out. BEAM schedulers handle thousands of WebSocket/API clients without head-of-line blocking.
Fault tolerance. Supervisors contain failures; a bad notification restarts a worker instead of taking down the UI.
Operational ergonomics. Telemetry hooks and OTP releases make tuning, blue/green, and hot upgrades straightforward.

What's Included

Phoenix LiveView UI (web-ng)
Rustler SRQL NIF and /api/query controller
Direct CNPG access with TimescaleDB + Apache AGE (HA/replicas via CNPG operator)
pg_notify → LiveView pushes for real-time streaming
MCP endpoints in Elixir (hermes_mcp)
OpenAPI generation via open_api_spex
Go core ingestion into the same CNPG cluster

Takeaways

We consolidated UI, query planning, and data access into a single Phoenix application while keeping Go focused on orchestration and ingestion. TimescaleDB and Apache AGE live under one CloudNativePG roof, and SRQL now runs as a library rather than a microservice. The result is a smaller blast radius, faster queries, simpler IAM, and a streaming model that rides on Postgres + BEAM instead of an external engine.

What's Changing​

Phoenix LiveView Replaces React​

SRQL Moves In-Process via Rustler​

Identity and Access: Built-In, Not Bolted-On​

Everything on CloudNativePG​

Go Core: Data Plane, Not Edge Plane​

Why This Matters​

Streaming Without a Separate Engine​

Real-Time Streaming with pg_notify​

API Shift to Elixir​

Core Becomes the Engine, Not the Edge​

Why the BEAM Fits​

What's Included​

Takeaways​