Ansible Integration
ServiceRadar drives Ansible playbook execution against devices in its inventory by talking to a customer-side AWX/AAP controller, and surfaces every run as a first-class resource with live status, per-host outcomes, full audit, and the same OCSF-based universal-log-viewer search as the rest of the platform. It is the supported alternative to running ARA next to a stand-alone AWX deployment.
This guide covers:
- Architecture
- Deployment
- Operator guide — registering controllers, repositories, and schedules
- User guide — launching playbooks and watching runs
- Configuration reference — env vars + per-resource overrides
- RBAC reference — the eight
ansible.*permission keys - Troubleshooting
- v1 limitations
Architecture
ServiceRadar core (SaaS / control plane)
│
│ AgentCommandBus.dispatch/4
▼
ServiceRadar agent-gateway
│
│ ControlStream (gRPC)
▼
ServiceRadar agent (runs inside the customer's network)
│
│ invokes WASM plugin
▼
awx WASM plugin (cmd/wasm-plugins/awx/)
│
│ HTTPS REST + per-call credential broker grant
▼
AWX / AAP controller (in the customer's network)
AWX usually lives in the customer's private network. ServiceRadar core cannot reach it directly; the agent can. So every AWX REST call traverses the chain above — core → agent-gateway → agent → awx WASM plugin → AWX. The plugin is the network bridge; orchestration and persistence stay in Elixir.
Two execution patterns in the plugin:
| Entrypoint | Mode | Purpose |
|---|---|---|
run_check | On-demand via CommandRequest | Every AWX REST verb: awx.ping, awx.list_*, awx.fetch_template, awx.launch_job, awx.fetch_job, awx.cancel_job, awx.fetch_events_for_jobs |
inventory_sync | Scheduled assignment | Walks AWX inventories and emits a DeviceDiscovery aggregate via the same pipeline proxmox-inventory uses — DIRE merges the records and flips Device.ansible_managed = true |
Pulse-based event ingestion. ServiceRadar does not maintain a long-lived stream to AWX. Each registered controller has a RunPulseWorker Oban job that ticks every run_pulse_interval_ms (default 2000) and dispatches a single awx.fetch_events_for_jobs command covering every non-terminal PlaybookRun's (awx_job_id, last_event_id) watermark. The plugin makes one HTTP call per active job and returns aggregated events; EventIngestor persists them, drives the run state machine, and projects each result into an OCSF Application Activity event for the universal log viewer.
Two playbook sources, both surfaced as Playbook rows with a source_type discriminator:
:git— registered git repositories; cloned + parsed byGitCatalogSyncWorker. Operators bind each git-sourced playbook to an AWX job template before it becomes launchable.:awx— AWX Job Templates auto-mirrored byAwxCatalogSyncWorker. Launchable by definition.
A single playbook can appear via both sources; both source types coexist in the catalog UI.
Deployment
Prerequisites
- An AWX or AAP instance reachable from at least one ServiceRadar agent.
- An OAuth2 personal access token from AWX with read access to inventories / projects / job_templates, plus permission to launch the templates ServiceRadar will run.
- A ServiceRadar agent registered to the gateway and reachable from the AWX network (typical: same Kubernetes cluster, same VPC, same VLAN).
Apply the schema migration
The Ansible tables are added via a single named Ash migration. From a freshly checked out copy with the runtime running against your target database:
cd elixir/serviceradar_core
mix ash.codegen add_ansible_integration # generates the migration
mix ash.migrate # applies it
Verify with:
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'platform' AND table_name LIKE 'ansible_%'
ORDER BY table_name;
You should see 14 tables (resources + their AshPaperTrail _versions mirrors for the four audited resources): controllers, playbook_repositories, playbooks, playbook_runs (+ run_targets / plays / tasks / task_results / contents), and playbook_schedules.
Deploy the awx WASM plugin
The plugin lives at go/cmd/wasm-plugins/awx/ and exposes two manifests:
plugin.yaml— the on-demandrun_checkentrypoint for REST verbsplugin.inventory_sync.yaml— the scheduledinventory_syncentrypoint that emitsDeviceDiscoveryrecords
Build:
cd go/cmd/wasm-plugins/awx
tinygo build -o awx.wasm \
-target=wasi -gc=conservative -scheduler=none -no-debug ./
Import the signed package through ServiceRadar's plugin staging flow (the platform requires Rekor / cosign verification for plugin imports — see the WASM Plugins guide for the publishing flow). Then assign both plugin manifests to the agent(s) that reach your AWX network. The inventory_sync assignment is what drives the Device.ansible_managed flag — without it, no devices flip to ansible-managed.
Configure environment variables
ServiceRadar exposes the operator-tunable knobs as env vars surfaced in both docker-compose.yml and helm/serviceradar/values.yaml. The defaults are conservative; tune only when needed.
| Env var | Default | What it does |
|---|---|---|
ANSIBLE_RETENTION_RUN_DETAIL_DAYS | 90 | Prune PlaybookPlay / PlaybookTask / PlaybookTaskResult past this age. 0 disables detail pruning. |
ANSIBLE_RETENTION_RUN_SUMMARY_DAYS | (empty) | When set, delete the entire PlaybookRun (cascades to targets / plays / tasks / results) past this age. Empty = keep forever. |
ANSIBLE_RETENTION_INTERVAL_SECONDS | 86400 | How often RetentionWorker scans. |
AWX_CONTROLLER_HEALTH_INTERVAL_SECONDS | 30 | ControllerHealthWorker cadence (one awx.ping per registered controller). |
AWX_RUN_WATCHDOG_INTERVAL_SECONDS | 60 | RunWatchdog interval — flags stuck non-terminal runs (past 2× job_template timeout, or 1 h fallback). |
AWX_SCHEDULE_EVALUATOR_INTERVAL_SECONDS | 60 | ScheduleEvaluatorWorker cron evaluation cadence. |
ANSIBLE_CATALOG_BASE_DIR | /var/lib/serviceradar/ansible_catalog (helm) / <tmp> (compose) | Base directory for GitCatalogSyncWorker repo clones. Mount a PVC at this path in Kubernetes to keep the cache warm across pod restarts. |
In Helm, these live under core.ansible.*:
core:
ansible:
runDetailDays: 90
runSummaryDays: "" # keep forever
retentionIntervalSeconds: 86400
controllerHealthIntervalSeconds: 30
runWatchdogIntervalSeconds: 60
scheduleEvaluatorIntervalSeconds: 60
catalogBaseDir: "/var/lib/serviceradar/ansible_catalog"
The current effective values are surfaced at runtime in Settings → Ansible → Retention.
Operator guide
Permissions: this guide assumes
ansible.controllers.manage+ansible.repositories.manage+ansible.schedules.manage. Admins have these by default; see RBAC reference for the full set.
1. Store the AWX API token in the credential broker
The Ansible integration never sees a plaintext token — it always passes a credential broker grant referencing a stored secret. Create the secret first.
From iex -S mix against core-elx:
alias ServiceRadar.Actors.SystemActor
alias ServiceRadar.Credentials.NetworkCredentialSecret
{:ok, secret} =
NetworkCredentialSecret.create_secret(
%{
name: "awx-prod",
provider: "awx",
credential_kind: :api_token,
secret_payload: "PASTE-AWX-OAUTH2-TOKEN-HERE"
},
actor: SystemActor.system(:setup)
)
IO.inspect(secret.id, label: "credential_secret_id")
The credential broker, not ServiceRadar core, handles plaintext. SSH keys, become passwords, and vault passwords are never stored here — those live in AWX's credential vault. The only secret ServiceRadar holds is the AWX OAuth2 token, encrypted at rest via AshCloak.
2. Register an AWX controller
Navigate to Settings → Ansible → Controllers and click + Add controller. Fill in:
| Field | Notes |
|---|---|
| Name | Operator-facing label, unique. Used in run logs, OCSF events, audit trails. |
| Agent ID | The ServiceRadar agent that reaches this AWX. Must have both plugin assignments. |
| Description | Optional. |
| Base URL | https://awx.internal.example.com — must include scheme. |
| Credential secret ID | The UUID from step 1. (v1 limitation: paste manually; picker UX comes later.) |
| Inventory sync (s) | Plugin-side cadence for inventory_sync. Default 300. |
| Catalog sync (s) | AwxCatalogSyncWorker cadence (mirrors AWX templates as :awx-sourced playbooks). Default 600. |
| Run pulse (ms) | RunPulseWorker cadence — lower for snappier UI, higher for lower AWX API load. Default 2000. |
Save. Within AWX_CONTROLLER_HEALTH_INTERVAL_SECONDS (default 30), ControllerHealthWorker dispatches awx.ping → plugin → AWX → EventIngestor writes last_health_at + flips status to :ok. Refresh the row.
If the status stays :unknown past two health intervals, see Troubleshooting.
3. Register a git playbook repository (optional)
Navigate to Settings → Ansible → Repositories and click + Add repository. Fill in:
| Field | Notes |
|---|---|
| Name | Unique. |
| Ref | Branch or tag. Default main. |
| Description | Optional. |
| Git URL | HTTPS only. SSH is a v2 feature. |
| Deploy token secret ID | Optional. Public repos: leave blank. Private repos: create a NetworkCredentialSecret with the HTTPS deploy token (same shape as the AWX API secret) and paste the UUID here. |
| Sync interval (s) | GitCatalogSyncWorker cadence. Min 60s; default 600s. |
Save. GitCatalogSyncWorker clones the repo to $ANSIBLE_CATALOG_BASE_DIR/<repository_id>/, walks .yml / .yaml files, parses each as an Ansible playbook (the first play's metadata becomes the row), and upserts one Playbook row per file with source_type: :git.
Per-file YAML parse failures are surfaced inline rather than dropped — the row shows up with parse_status: :error and a diagnostic on the catalog page, so operators can spot broken playbooks instead of wondering why they're missing.
Bind git-sourced playbooks to an AWX template before launching. Git-sourced rows show up in
/ansible/catalogwith anunboundwarning badge until anawx_job_template_idis set. Operators are expected to keep the AWX project + job_template configured to match; ServiceRadar does not auto-create AWX templates from git playbooks in v1.
4. Watch inventory flow in automatically
If the inventory_sync plugin assignment is wired up on the controller's agent, within inventory_sync_interval_seconds you should see devices in your inventory flipping to ansible_managed: true with ansible_inventory_ref populated. This is driven by:
- Plugin runs on schedule.
- Plugin calls AWX inventory API.
- Plugin emits a
DeviceDiscoveryaggregate (source: "awx") viaresult.WithDeviceDiscovery(...). - Agent → gateway → DIRE merges the records with existing devices (matching on
ansible_hostIP, hostname, or AWX hostnamein priority order). - Matched devices get
ansible_managed = true; AWX hosts that DIRE cannot match surface in Settings → Ansible → Controllers as a "needs review" list (v2 feature; currently they're emitted but not yet rendered).
No manual "mark Ansible-managed" toggle exists — the state is fully derived.
5. Create a scheduled run (optional)
Navigate to Settings → Ansible → Schedules and click + Add schedule. Fill in:
| Field | Notes |
|---|---|
| Name | Unique. |
| Enabled | Default on. |
| Playbook | Dropdown filtered to launchable playbooks (anything with awx_job_template_id). |
| Target device UIDs | Comma-separated OCSF UIDs (e.g. sr:abc,sr:def). All must share one controller. (v2: proper multi-select picker.) |
| Cron | Standard 5-field expression — e.g. 0 3 * * * for daily at 03:00. |
| Timezone | UTC / Etc/UTC only in v1. (:tzdata is not currently a dependency.) |
| Allow concurrent runs | Default off — when the previous run is still non-terminal, the next fire is recorded as :skipped_overlap. |
| extra_vars (JSON) | Passed to AWX on each fire. |
ScheduleEvaluatorWorker fires every minute by default (configurable via AWX_SCHEDULE_EVALUATOR_INTERVAL_SECONDS); each due schedule produces a PlaybookRun exactly as if a human had launched it from the UI.
6. Retention
Navigate to Settings → Ansible → Retention for a read-only view of the effective retention windows + worker cadences. Changes are env-var driven; redeploy after editing values.yaml / docker-compose.yml.
User guide
Permissions: this section requires
ansible.runs.launch. The Run Task button is hidden for users without it. To view runs only,ansible.runs.viewis enough.
Browse the catalog
/ansible/catalog shows every Playbook ServiceRadar has discovered, both :git- and :awx-sourced. Filter by:
- Source (all / git / awx)
- Binding (all / launchable / unbound)
- Free-text on name + description
Bound rows show a green badge with the AWX job template id; unbound rows show a warning badge — those aren't launchable until an awx_job_template_id is set on the row.
Launch against multiple devices
- Visit
/devices. - Tick the checkbox on each ansible-managed device you want to target. Non-managed devices have the checkbox available but the Run Task button validates them on submit.
- Click + Run Task in the bulk-action toolbar (top-right of the inventory table, next to Bulk Edit / Bulk Delete).
- ServiceRadar navigates to
/ansible/launch?devices=...pre-filled with your selection.
The Launch page validates the targets:
- All devices must be
ansible_managed. - All devices must point at the same AWX controller. Mixed-controller selections are rejected with a clear error.
Launch against a single device
On a device detail page, the Run Task action button appears in the page header (next to Edit / Delete / Console) if all of: you have ansible.runs.launch, the device is not soft-deleted, AND the device is ansible_managed.
Clicking it goes to /ansible/launch?devices=<uid> with the single device pre-filled.
The launch form
- Targets: read-only summary of selected devices with an ansible-managed badge per row.
- Playbook: dropdown of launchable playbooks; for each pick, the variable form below re-renders with typed inputs derived from the playbook's variable schema:
- AWX-sourced playbooks use the survey_spec (
text/textarea/password/integer/float/multiplechoice/multiselect). - git-sourced playbooks use
vars_prompt(text, pluspasswordwhenprivate: true). Defaults are pre-filled; required fields are marked.
- AWX-sourced playbooks use the survey_spec (
- Override extra_vars as raw JSON (checkbox). When toggled on, a textarea appears whose contents merge over the typed inputs at submit. Use this for variables not declared in the playbook's schema.
- Launch — submits; on success, you're redirected to
/ansible/runs/:id.
Watch a run
/ansible/runs/:id subscribes to a per-run PubSub topic and live-updates as RunPulseWorker drains events from AWX. You'll see:
- Header card: state pill (pending / launching / running / succeeded / partial / failed / unreachable / canceled), AWX job id, scheduled-vs-ad-hoc badge, timestamps, duration.
- Targets table: per-host status with ok / changed / failed / skipped / unreachable counts.
- Plays accordion: per-play status + task count; click to expand and see individual tasks.
The state machine is enforced: a terminal run (succeeded, partial, failed, unreachable, canceled) never transitions further. Late events arriving from AWX after a terminal transition are still persisted to the task table for completeness but don't move the state.
Listing runs
/ansible/runs is the cross-controller index, filterable by state (all / pending / launching / running / succeeded / partial / failed / unreachable / canceled). The "Refresh" button reloads with the current filter; PubSub also live-inserts rows that match the active filter as their state changes.
Cron-driven runs
Schedules registered in Settings → Ansible → Schedules fire automatically. Each fire produces a regular PlaybookRun with schedule_id set — visible in both /ansible/runs (with a "scheduled" badge) and on the schedule's row in the settings tab (last fire + outcome badge).
Universal log viewer
Every state transition + every task result also produces an OCSF Application Activity (class 6003) event in the universal log viewer. Each event carries an unmapped.ansible block with run_id, playbook_id, controller_id, awx_job_id, task_name, awx_host_name, device_uid, etc., so you can search:
- by run id to see one run's full event stream
- by device uid to see every ansible activity that touched a host
- by task name across all runs ever
- by status to filter for failures globally
This is the supported replacement for ARA's UI for cross-run search — ServiceRadar's structured per-run pages are richer than ARA for one run, and the universal log viewer is richer than ARA for cross-run aggregation.
Configuration reference
Per-controller overrides
Stored on each AnsibleController row; override the deployment-wide cadence per controller:
| Column | Default | Override |
|---|---|---|
inventory_sync_interval_seconds | 300 | Plugin's inventory_sync assignment cadence for this controller. |
catalog_sync_interval_seconds | 600 | AwxCatalogSyncWorker cadence for this controller. |
run_pulse_interval_ms | 2000 | RunPulseWorker cadence — lower = snappier UI, higher = lower AWX API load. |
Editable in the Controllers tab.
Per-repository overrides
| Column | Default | Override |
|---|---|---|
sync_interval_seconds | 600 | GitCatalogSyncWorker cadence for this repo. Min 60s. |
Per-schedule overrides
| Column | Default | Override |
|---|---|---|
cron | required | Standard 5-field cron expression. |
timezone | UTC | UTC / Etc/UTC only in v1. |
allow_concurrent | false | When true, fire even if the previous run is still non-terminal. |
RBAC reference
The eight ansible permission keys, with the default role assignments:
| Key | Default roles | What it grants |
|---|---|---|
ansible.controllers.manage | admin | Register / edit / delete AnsibleController. Required to reach the Controllers tab. |
ansible.repositories.manage | admin | Register / edit / delete PlaybookRepository. Required to reach the Repositories tab. |
ansible.catalog.view | all (viewer / helpdesk / operator / admin) | Browse /ansible/catalog. |
ansible.runs.view | all | View /ansible/runs and /ansible/runs/:id. |
ansible.runs.launch | operator / admin | Launch playbooks; the Run Task button is hidden without this. |
ansible.runs.cancel | operator / admin | Cancel an in-progress run. |
ansible.schedules.view | all | View existing schedules. |
ansible.schedules.manage | operator / admin | Register / edit / enable / disable / delete schedules. |
These map to Ash resources via ServiceRadarWebNGWeb.Authorization.Permissions. Per-event Permit gates in each LiveView enforce the right verb on the right resource.
Troubleshooting
Controller status stays :unknown past two health intervals
Probable causes, in order of likelihood:
- Plugin not assigned. Confirm both
awxplugin manifests are assigned to the controller'sagent_idvia/settings/plugins(oriex→ServiceRadar.Plugins). Without therun_checkentrypoint assigned the agent can't even respond toawx.ping. - Agent offline. Check the agent's connection state. The launcher (
RunLauncher.launch/2) explicitly fails launches with a typed error when the agent isn't connected; controller health calls fail silently. Look for[error] AWX ControllerHealthWorker: dispatch failedin the core-elx logs. - Agent can't reach AWX. From inside the agent's network namespace:
curl -k -H "Authorization: Bearer <token>" https://<base_url>/api/v2/ping/. If this fails, the plugin will too. - TLS verification. The default is to verify; if AWX uses a self-signed cert the plugin will error 401/x509. The controller resource's
metadata.insecure_skip_verify = trueflag bypasses verification but is not yet exposed in the form — set it manually viaiexfor now.
Status flips to :unauthorized
The token is wrong, expired, or missing scope. Check Controller.last_health_summary — it'll surface the operator-safe 401 message from the plugin. Update the NetworkCredentialSecret's secret_payload and re-trigger health by editing any field on the controller (re-save trips the health check).
No playbooks appear in /ansible/catalog
- AWX-sourced:
AwxCatalogSyncWorkerticks every 600s by default. The first sync after registering a controller can take that long. Lowercatalog_sync_interval_secondson the controller if you want faster turnaround for setup. - Git-sourced:
GitCatalogSyncWorkerticks every 600s by default. Confirm the agent / pod has filesystem write access toANSIBLE_CATALOG_BASE_DIR. Look for[warning] AWX GitCatalogSyncWorker: git sync failedin logs — thePlaybookRepository.last_sync_summaryfield surfaces the sanitized error.
Devices don't flip to ansible_managed: true
- Confirm the
inventory_syncplugin manifest is assigned (separate from the on-demand manifest). - Confirm the agent can reach AWX (same constraint as health).
- Check
DiscoveryRecordingestion in DIRE — the AWX hosts may be matching but onto different devices (hostname collision). Look forawxin the device'sdiscovery_sourcesset. - Hosts AWX has that DIRE can't match are emitted but not yet surfaced; check the agent logs for
inventory_syncdiscovery records.
Run stuck in :pending
The launch command never returned a result from AWX. Likely causes:
- Agent disconnected after dispatch.
RunPulseWorkerdoesn't auto-retry launches; the run will eventually be picked up byRunWatchdog(~hourly fallback) and transitioned to:unreachable. - AWX rejected the launch payload. Check
[info] Ansible launch failedin logs and the user's flash message at launch time. - The launch's
awx.launch_jobcommand result was lost. Checkplatform.agent_commandsfor the command_id of the run's last dispatch — itsstatusandfailure_reasoncolumns will tell you.
Run stuck in :running past terminal
RunWatchdog transitions any non-terminal run past 2 × job_template.timeout (or 1 h fallback) to :unreachable with a diagnostic recording the watchdog reason. If you want a tighter watchdog, set a job_template_timeout_seconds on the run's metadata at launch time (currently iex-only).
"AWX rejected the request" 401 / 403 on launch
The plugin surfaces these as operator-safe typed errors with "check controller token" in the message. Either:
- The token expired — rotate in AWX and update the NetworkCredentialSecret.
- The token doesn't have launch permission on this job template — adjust the AWX user / team for the token, OR use a different token with broader scope.
Schedule not firing
- Check the row in the Schedules tab — the
last_evaluation_outcomebadge tells you what happened on the last tick (fired,skipped_overlap,skipped_disabled,skipped_ineligible_targets,error). - Confirm the schedule is enabled (toggle in the action column).
- Confirm
next_run_atis populated and in the past. If it'snil, the cron expression failed to parse — the form validates client-side viaOban.Cron.Expression.parse/1but legacy rows might predate validation; edit and re-save. - Non-UTC timezones return
:timezone_database_unavailableand the schedule never fires. Stick to UTC / Etc/UTC until:tzdatais added.
Per-run RBAC questions
If a user has ansible.runs.launch but launches fail with a 403:
- They may not have
ansible.runs.view. The LaunchLive page itself only checksruns.launch, but the post-launch redirect to/ansible/runs/:idrequiresruns.view. - The Permit per-event gates enforce verbs on specific resources; check
ServiceRadarWebNGWeb.Authorization.Permissionsfor the mapping if a permission seems to not be honored.
v1 limitations
These are documented constraints, not bugs. Each is tracked for a future v2:
- AWX-sourced execution only. Direct
ansible-playbookexecution by a ServiceRadar agent is reserved for a follow-up; v1 requires an AWX/AAP controller. Workflows that orbit AWX (its credential vault, its inventory plugins, its executor pool) are the supported path. - UTC schedules only. Non-UTC timezones need the
:tzdataElixir dependency, which isn't currently bundled. - Public HTTPS git repos. The
GitCatalogSyncWorkersupports HTTPS deploy tokens via the credential broker but not SSH keys yet. - Schedules require AWX-sourced playbooks. Git-sourced playbooks can be launched ad-hoc once bound to an AWX template, but the schedule worker rejects them in v1 with
:git_sourced_not_supported_v1. - Multi-device UI launches require a single controller. AWX uses
limit:to scope to specific hosts; mixed-controller selections are rejected at submit time. Multi-controller fan-out is a v2 design question. - Webhook ingestion is deferred. The proposal's design notes a sketched agent-side receiver that would augment pulse polling for lower-latency state-transition updates from very large AWX deployments. Pulse polling is the only ingestion path in v1.
- OCSF class selection. Events project as Application Activity (6003). If operator search habits favor Process Activity (1007) instead, the mapping module can be swapped without touching the data model.
- Run retention exclusion window. The proposal called for "skip runs accessed within the last hour" in retention sweeps, but the worker doesn't yet check
accessed_at(the column hasn't been added). Set generousANSIBLE_RETENTION_RUN_DETAIL_DAYSif you frequently revisit old runs. - Manual UUID paste for credential secret references. Both controller and repository forms expect operators to paste a UUID from
Settings → Credentials. A picker UX is a planned v2 improvement.