Secure Edge Onboarding
This runbook explains how to issue ServiceRadar onboarding packages for hosts that
run outside the Kubernetes demo cluster. It captures the current poller flow and
lays out the forthcoming agent/checker enhancements so operators know what to
gather before a rollout. See docs/docs/edge-agent-onboarding.md for the
component-by-component breakdown and KV automation model.
Status overview
- Poller onboarding is live today via the edge onboarding service in Core.
- Agent and checker onboarding leverage the same package machinery but still require manual KV updates. The backend/API work (GH-1904 / serviceradar-54) will automate those steps in an upcoming release.
- Next milestone (GH-1909) tracks the multi-component UI/API so operators can issue packages for pollers, agents, and checkers from one flow.
Component scope and relationships
| Component | Parent association | Package artifacts | KV path updated on create | Notes |
|---|---|---|---|---|
| Poller | None | edge-poller.env, SPIRE join token + bundle | config/pollers/<poller-id> | Establishes the edge site and acts as the control plane for downstream agents. |
| Agent | Poller | edge-agent.env, SPIRE join token + bundle | config/pollers/<poller-id>/agents/<agent-id> | Agents inherit connectivity details from their parent poller and surface checker slots. |
| Checker | Agent | edge-checker.env (planned), SPIRE assets when needed | config/agents/<agent-id>/checkers/<checker-id> | Checkers depend on an agent for dispatch and credential management. |
When the new onboarding UI lands, the operator must declare the component type
up front. Selecting Agent requires choosing a poller parent; selecting
Checker requires choosing an agent parent (which implicitly ties it to that
agent’s poller). During package creation Core will update the parent KV
document, marking the new child as pending so it activates automatically once
the installer reports back.
1. Prerequisites
| Requirement | Notes |
|---|---|
| Demo cluster access | You need kubectl + admin access to the demo namespace. |
| Admin credentials | API key (serviceradar-secrets), and an admin JWT (login to the web app or call /auth/login). |
| External endpoints | From the edge host, Core/SPIRE/KV must be reachable: - Core gRPC: 23.138.124.18:50052- Core SPIFFE ID: spiffe://carverauto.dev/ns/demo/sa/serviceradar-core- KV gRPC: 23.138.124.23:50057- KV SPIFFE ID: spiffe://carverauto.dev/ns/demo/sa/serviceradar-datasvc- SPIRE server gRPC: 23.138.124.18:18081 |
| CLI tools | serviceradar-cli, docker, docker compose, tar, jq, kubectl. |
| Repo | Clone github.com/carverauto/serviceradar on the edge host. |
2. Metadata Template
Every onboarding package embeds an edge-poller.env file generated from
metadata you supply. Use this baseline JSON and adjust values only when your
load balancers change:
{
"core_address": "23.138.124.18:50052",
"core_spiffe_id": "spiffe://carverauto.dev/ns/demo/sa/serviceradar-core",
"kv_address": "23.138.124.23:50057",
"kv_spiffe_id": "spiffe://carverauto.dev/ns/demo/sa/serviceradar-datasvc",
"spire_upstream_address": "23.138.124.18",
"spire_upstream_port": "18081",
"spire_parent_id": "spiffe://carverauto.dev/ns/edge/poller-nested-spire",
"agent_address": "agent:50051",
"agent_spiffe_id": "spiffe://carverauto.dev/services/agent",
"log_level": "debug",
"logs_dir": "./logs",
"nested_spire_assets": "./spire",
"serviceradar_templates": "./packaging/core/config",
"nested_spire_wait_attempts": "120",
"spire_insecure_bootstrap": "false"
}
Keep the metadata focused on connectivity. Arbitrary keys are persisted but ignored by the bootstrap scripts.
3. Poller Onboarding (Current)
3.1 Issue a package
- Log into the web UI as an admin and open Admin → Edge Onboarding.
- Click Issue new installer and provide:
- Component type: Set to Poller. (The multi-component selector ships with GH-1909; until then the form defaults to pollers.)
- Component ID: Leave blank to auto-generate a slug from the label, or provide a custom lowercase identifier (hyphen-separated).
- Label (required): Friendly name; also seeds the poller ID.
- Poller ID (optional): Override the generated slug.
- Site (optional): Free-form location tag.
- Downstream SPIFFE ID (optional): Leave blank to auto-generate
spiffe://carverauto.dev/ns/edge/<poller-id>. - SPIRE selectors: Leave empty unless you need extra selectors. We always
include the defaults (
unix:uid:0,unix:gid:0,unix:user:root,unix:group:root). - Metadata JSON: Paste the template from §2 (adjust endpoints if needed).
- Join/Download TTLs: Defaults (30m / 15m) are fine for most cases.
- Notes: Optional operator guidance.
- Submit. The UI returns a join token, download token, and bundle PEM. Copy them somewhere secure—they are displayed once.
CLI equivalent:
serviceradar-cli edge package create \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT" \
--label "Reno Edge Poller" \
--metadata-json "$(cat metadata.json)"
Inspect existing packages or reissue tokens without touching the UI:
# Summaries (table or --output=json)
serviceradar-cli edge package list \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT"
# Detailed view + fresh onboarding token
serviceradar-cli edge package show \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT" \
--id <PACKAGE_ID> \
--reissue-token \
--download-token <NEW_DOWNLOAD_TOKEN>
3.2 Download the artifacts
- UI: Click Download while the package status is Issued.
- CLI:
serviceradar-cli edge package download \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT" \
--id <PACKAGE_ID> \
--download-token <DOWNLOAD_TOKEN> \
--output edge-package.tar.gz
# Optional JSON payload for records or automation
serviceradar-cli edge package download \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT" \
--id <PACKAGE_ID> \
--download-token <DOWNLOAD_TOKEN> \
--format json \
--output edge-package.json
The archive contains:
README.txt
metadata.json
edge-poller.env
spire/upstream-join-token
spire/upstream-bundle.pem
3.3 Bootstrap the Docker stack
- Copy the archive onto the edge host (repo root).
- Extract it:
tar -xzvf edge-package.tar.gz. - IMPORTANT: Update the env file for Docker environments:
# The agent address must be localhost:50051 for shared PID namespace
sed -i 's/POLLERS_AGENT_ADDRESS=agent:50051/POLLERS_AGENT_ADDRESS=localhost:50051/' edge-poller.env
# SPIRE upstream must use LoadBalancer IP, not k8s DNS
sed -i 's/POLLERS_SPIRE_UPSTREAM_ADDRESS=spire-server.demo.svc.cluster.local/POLLERS_SPIRE_UPSTREAM_ADDRESS=23.138.124.18/' edge-poller.env
sed -i 's/POLLERS_SPIRE_UPSTREAM_PORT=8081/POLLERS_SPIRE_UPSTREAM_PORT=18081/' edge-poller.env - Run the automated restart:
This script wipes stale volumes, regenerates configs, injects the metadata, and brings
docker/compose/edge-poller-restart.sh \
--env-file edge-poller.env \
--skip-refresh # uses the packaged join token/bundleserviceradar-poller+serviceradar-agentonline in SPIFFE mode. - Verify:
docker compose --env-file edge-poller.env -f docker/compose/poller-stack.compose.yml ps
docker logs serviceradar-poller | grep -i "Node attestation was successful"
docker logs serviceradar-agent | grep -E "(Starting|KV)"
3.4 Activation check
- In Core, the package transitions from Issued → Delivered once the download succeeds, then Activated after the new poller reports in.
kubectl logs deployment/serviceradar-core -n demo | grep <poller-id>should show the poller entering the allowed list.
3.5 Revoking a poller (optional)
serviceradar-cli edge package revoke \
--core-url https://demo.serviceradar.cloud \
--api-key "$SERVICERADAR_API_KEY" \
--bearer "$ADMIN_JWT" \
--id <PACKAGE_ID> \
--reason "Retired edge host"
Core deletes the downstream SPIRE entry, clears the tokens, and marks the package Revoked.
3.6 Offline bootstrap (ONBOARDING_PACKAGE)
Air-gapped hosts (or CI jobs that do not have direct Core access) can skip the HTTP download step entirely:
- Download the tarball once from a connected machine (
edge package download --output edge-package.tar.gz). - Copy the archive to the target host.
- Export both the KV endpoint and the archive path before launching the poller
(or agent/checker):
export KV_ENDPOINT=23.138.124.23:50057
export ONBOARDING_PACKAGE=/opt/serviceradar/offline/edge-package.tar.gz
# Optional: still export ONBOARDING_TOKEN if you want the bootstrapper to
# notify Core about activation when connectivity returns. - Start the service as normal; the bootstrapper verifies
metadata.json/spire/*from disk, writes configs under/var/lib/serviceradar, and proceeds with SPIRE + KV enrollment.
ONBOARDING_PACKAGEworks for every Go service that already callsedgeonboarding.TryOnboard. Rust/sysmon parity is in progress; until then the CLI download remains the canonical bootstrap for Go binaries.
4. Agent Onboarding (Planned Enhancement)
The service currently issues poller installers only. The next milestone adds agent support so Core can publish a package that targets an existing poller and pre-wires KV.
Target flow
- Operator selects Component type → Agent in the onboarding form.
- UI prompts for Associated poller and surfaces pollers that are active or have pending packages.
- Optional presets let the operator choose common agent roles (SNMP gateway, sysmon collector, etc.) to scaffold metadata.
- Core issues a package containing:
- SPIRE join token/bundle for the agent workload entry.
edge-agent.envwith Core/KV endpoints, the parent poller ID, and any metadata captured in the form.
- Core immediately updates the poller’s KV document at
config/pollers/<poller-id>/agents/<agent-id>withstatus: "pending"so the poller starts streaming tasks to the agent on activation. - Activation events flip the KV entry to
activeand add an audit record that references the parent poller.
Interim workaround (manual)
Until the automation lands:
- Duplicate the poller metadata JSON, set
"agent_address"to the new agent’s reachable endpoint, and issue a poller package (for SPIRE assets). - Extract
edge-poller.env, rename toedge-agent.env, and tailor the env for the agent container. - Update KV manually:
serviceradar-cli kv get --key config/pollers/<poller-id>/agents/<agent-id>.json- Merge the new agent definition, set
status: "pending". serviceradar-cli kv put --key config/pollers/<poller-id>/agents/<agent-id>.json --file updated.json
5. Checker Onboarding (Planned Enhancement)
Checkers (SNMP, sysmon-osx, custom scanners) will reuse the same framework. The UX will prompt for:
- Checker type.
- Parent agent.
- Any device-specific credentials.
Core will then:
- Require Component type → Checker and selection of the parent agent (with an inline reminder of the agent’s parent poller).
- Issue SPIRE credentials for the checker workload (if needed).
- Update the agent’s KV entry at
config/agents/<agent-id>/checkers/<checker-id>withstatus: "pending", checker kind, and metadata (targets, credentials). The agent promotes the checker toactiveonce it reports back.
Manual procedure today
- Add the checker JSON under the agent’s KV config.
- Restart the agent container so it re-reads configuration.
- For SPIFFE-enabled checkers, craft join tokens via
serviceradar-cli spire-join-tokenand distribute them separately.
6. Troubleshooting
| Symptom | Checks |
|---|---|
PermissionDenied: no identity issued in poller logs | The join token may have expired or been consumed. Generate a fresh package, extract, and rerun edge-poller-restart.sh --skip-refresh. |
/api/admin/edge-packages download returns 409 | The download token was already used—issue a new package. |
| Agent never appears under the poller | Verify KV config for the poller lists the agent; if not, reapply the edits (manual step until agent onboarding is automated). |
| Nested SPIRE server loops | Clear compose_poller-spire-runtime (docker volume rm) before running the restart script to avoid stale sockets. |
produced zero addresses in SPIRE upstream agent | The upstream address is using k8s DNS which doesn't resolve from Docker. Update POLLERS_SPIRE_UPSTREAM_ADDRESS to use the LoadBalancer IP (e.g., 23.138.124.18). |
join token was not provided in downstream agent | Known limitation: The downstream nested agent requires manual join token generation. This will be automated in a future release. |
| Agent hangs on "Initializing SPIFFE security provider" | The agent must share PID namespace with poller for SPIRE workload API attestation. The poller-stack.compose.yml already configures this with pid: "service:poller". |
| Poller health checks fail for agent | If using shared network namespace, ensure POLLERS_AGENT_ADDRESS=localhost:50051 (not agent:50051) in the env file. |
7. Next Steps
- Finish the API/UI work for agent and checker packages so metadata + KV updates are produced automatically (serviceradar-54 / GH-1904, successor GH-1909).
- Add UI presets so operators can choose “Edge poller” / “Edge agent” / “Edge checker” without pasting JSON.
- Extend the restart script to install agents/checkers once the new packages are available.
Tracking
- GitHub: GH-1909 “Edge onboarding: support agents and checkers”.
- Beads: New follow-up issue to succeed
serviceradar-54once the docs are updated (see repo.beadsindex).