Building Custom Checkers
ServiceRadar treats checkers as independent gRPC services that hang off an agent. The poller asks the agent for status, the agent proxies the request to each checker, and the poller forwards the aggregated responses to core. Core adds device metadata automatically so the UI can light up as soon as a checker starts returning data.
Architecture At A Glance
- Checker – Collects metrics from the target system and exposes the
monitoring.AgentServicegRPC API (mainlyGetStatus, optionallyGetResults/StreamResults). - Agent (
pkg/agent/server.go) – Maintains checker connections via the registry inpkg/agent/registry.go, adds security, and returns a unifiedStatusResponse. - Poller (
pkg/poller/poller.go) – Readspoller.json, executes each check viaAgentService.GetStatus, wraps the payload with poller/agent ids, and ships everything to core. - Core (
pkg/core/pollers.go) – CallsprocessServicePayload, which now invokesensureServiceDeviceto register devices for any checker that reports a host IP. Core fans out to the device registry and metrics stores so the UI and SRQL can query the data.
Request/Response Sequence
The
detailsfield frompoller.jsonis passed verbatim to the agent and on to the checker. For gRPC checkers it must behost:port.
Checker Responsibilities
-
Implement the gRPC surface
Useproto/monitoring.protoand register anAgentServicein yourmain. The poller only callsGetStatus, but implementGetResults/StreamResultsif you need paginated or chunked results. -
Respond quickly & always populate host identity
GetStatusmust return within ~30 seconds. Include the following fields in the JSON payload stored understatus:{
"status": {
"timestamp": "2025-10-12T21:16:42Z",
"host_ip": "192.168.1.219",
"host_id": "sysmonvm.local",
"hostname": "sysmonvm.local",
"...": "..."
}
}Core’s
ensureServiceDevice(seepkg/core/devices.go:31) looks forstatus.host_ip,host_ip, or any field whose key containsip. As soon as the checker reports a stable IP, core emits aDiscoverySourceSelfReportedupdate and the device appears in the UI. -
Expose a health endpoint
External checkers must answer gRPC health checks. If you reusemonitoring.AgentServicefor health, no extra work is needed—the agent falls back to that automatically (pkg/agent/registry.go:52). -
Return useful metadata
Include fields like CPU, memory, ports, or custom measurements. Use simple JSON primitives—core stores the payload verbatim and higher layers render it.
Agent & Registry Overview
- The agent loads checker configs from
CheckersDirand/or KV (seepkg/agent/server.go:1432). For gRPC checks the poller-provideddetailsvalue is sufficient—you do not need an extra config file. - Service types are registered in
pkg/agent/registry.go. Most new checkers can use the existing"grpc"entry. Register a new type only if you need a bespoke transport. ExternalChecker(pkg/agent/external_checker.go) manages the TLS session, retries, and health checks. It creates a single gRPC channel and reuses it for subsequent calls.
Poller Behaviour
-
poller.jsondefines agents and checks. A minimal gRPC checker entry looks like:{
"service_type": "grpc",
"service_name": "sysmon-vm",
"details": "192.168.1.219:50110"
} -
AgentPoller.ExecuteChecks(pkg/poller/agent_poller.go:52) fans out across checks, callingAgentService.GetStatusin parallel and attaching the agent name when the checker does not return anagent_id. -
Before reporting to core,
poller.enhanceServicePayload(pkg/poller/poller.go:680) wraps the raw checker JSON inside an envelope that records the poller id, agent id, and partition. Core depends on that.
Core Ingestion Path
ReportStatus(pkg/core/pollers.go:803) receives the batchedServiceStatusmessages.processServicePayload(pkg/core/metrics.go:797) peels off the poller envelope. Right after parsing it callsensureServiceDevice, which:- Extracts the host IP/hostname with
extractCheckerHostIdentity. - Emits a
DiscoverySourceSelfReportedDeviceUpdatetagged withchecker_service,collector_agent_id, andcollector_poller_id. - Relies on the poller-provided agent id (the poller now fills it in
automatically
pkg/poller/agent_poller.go:233).
- Extracts the host IP/hostname with
- Type-specific handlers (
processSysmonMetrics, SNMP, ICMP, etc.) can still add richer metric objects; the device registration happens independently.
Building a New gRPC Checker
-
Bootstrap the project
- Depend on
proto/monitoring.proto. - Register
proto.RegisterAgentServiceServer.
- Depend on
-
Implement
GetStatus- Collect metrics and marshal them into a JSON structure nested beneath
status. - Populate
status.host_ipandstatus.hostname. - Fill in the top-level
StatusResponsefields:available,service_name,service_type,response_time.
- Collect metrics and marshal them into a JSON structure nested beneath
-
Optional: implement
GetResults/StreamResultsif the checker needs to return large datasets. The poller usesresults_intervalinpoller.jsonto schedule those calls. -
Provide a health probe
- Either implement the gRPC health service (
grpc.health.v1.Health), or reuse the sameGetStatushandler—the agent calls it with the checker’s name whengrpcServiceCheckNameismonitoring.AgentService.
- Either implement the gRPC health service (
-
Wire up TLS (optional but recommended)
- The agent clones its
SecurityConfigfor each checker, overridingserver_namewith the host portion ofdetails. Ship certificates under/etc/serviceradar/certs.
- The agent clones its
-
Run the checker as a service
- Package it with systemd or launchd (see
tools/sysmonvmfor examples). - Ensure the port in
detailsis reachable from the agent host.
- Package it with systemd or launchd (see
Configuring the Pipeline
- Update the poller – Add the checker entry to each agent section in
poller.json. Restart or hot-reload the poller (systemctl reload serviceradar-poller). - Confirm agent connectivity –
docker compose logs agentorjournalctl -u serviceradar-agentshould show “Connecting to checker service” without errors. - Verify poller reports –
docker compose logs poller | grep service_nameshould show the checker in each polling cycle. - Check core ingestion –
docker compose logs core | grep checkershould include “checker device through device registry” warnings only if the checker omits host identity.
Testing The End-to-End Flow
- Unit tests:
go test ./pkg/agent/...exercises the registry and checker wiring.go test ./pkg/poller/...ensures the poller packets embed metadata.go test ./pkg/core/...coversensureServiceDeviceand payload parsing.
- Manual smoke test:
- Launch the checker locally.
- Run the agent with the checker configured and use
grpcurlagainstGetStatus. - Start the poller and core (Docker compose or binaries).
- Load
/api/devices/<partition:ip>in the UI or call/api/devices/<id>/sysmon/cpu.
Troubleshooting
- Checker never appears – Confirm the JSON payload includes a valid
host_ip. Without it, core cannot derive the device id. - Agent logs invalid address – Ensure
detailsishost:port. The agent validates withnet.SplitHostPort. - Health check failures – If the checker does not implement gRPC health,
add a case to
pkg/agent/registry.goto setgrpcServiceCheckNametomonitoring.AgentService, or implement the health service. - Stale data warnings – Poller caches checker health for a short window. If
GetStatusis expensive, increase the checker’s own sampling interval and return cached metrics quickly.
Extending Beyond gRPC
- To introduce a brand-new
service_type, register it inpkg/agent/registry.goand provide achecker.Checkerimplementation. - For long-running collectors (e.g., sweep, SNMP) leverage the existing service
scaffolding under
pkg/agent. - Any checker that emits device identity will automatically take advantage of the new core-side device registration path—no extra integration needed.
With this flow, new checker authors can focus solely on their collector logic and JSON payloads. The agent, poller, and core layers handle transport, device registration, and UI surfacing without additional plumbing.