Identity Reconciliation Metrics and Alerts
Use these OpenTelemetry gauges to monitor promotion health and cardinality drift. All metrics emit on the serviceradar.identity meter and reflect the most recent reconciliation run:
identity_promotions_attempted_last_batch– total sightings evaluated.identity_promotions_last_batch– sightings promoted in the run.identity_promotions_eligible_auto_last_batch– policy-passing and auto-enabled.identity_promotions_shadow_ready_last_batch– policy-passing while in shadow mode.identity_promotions_blocked_policy_last_batch– failed policy gates (missing hostname/fingerprint/persistence/etc.).identity_promotion_shadow_only_last_batch– evaluated in shadow-only mode.identity_promotion_run_timestamp_ms– epoch ms of the last run.identity_promotion_run_age_ms– age of the last run in ms.identity_cardinality_current– current unified device count used for drift detection.identity_cardinality_baseline– configured baseline for drift detection (0 disables drift checks).identity_cardinality_drift_percent– percentage drift vs baseline (positive is over baseline).identity_cardinality_blocked– 1 when promotion is paused due to drift, 0 otherwise.
Alert recommendations
- Policy blocks:
identity_promotions_blocked_policy_last_batch > 0for 3 consecutive runs (PagerDuty) and WARN at 1 run. Indicates promotions are stuck on missing identifiers or thresholds. - Shadow backlog:
identity_promotions_shadow_ready_last_batch > 0while promotion shadow mode is enabled. Page if sustained for >30m; action is to enable auto-promotion or adjust policy. - Run staleness:
identity_promotion_run_age_ms > 900000(15 minutes) signals reconciliation is not running; WARN at 10 minutes. - Throughput dip:
identity_promotions_last_batch == 0 AND identity_promotions_blocked_policy_last_batch > 0for 2 runs suggests policy misconfiguration. - Volume surge: sudden jump in
identity_promotions_attempted_last_batch(>2x baseline) may indicate flood of sightings or replay; add a high-watermark alert tailored to environment. - Cardinality drift: alert when
identity_cardinality_drift_percentexceeds your tolerance (demo: 50k baseline with 2–3% tolerance) or whenidentity_cardinality_blockedstays at 1 for more than a few minutes.
Dashboard panels
- Single-stat for the latest values of each gauge (attempted, promoted, eligible, shadow-ready, blocked).
- Time-series of
attempted,eligible,blockedto spot trends after policy changes. - Run age + last run timestamp side-by-side with a threshold line at 15 minutes.
- Drift view: line chart of
identity_cardinality_drift_percentwith a horizontal tolerance line and a single-stat foridentity_cardinality_blocked. - Breakdown table for blocked reasons (if/when exposed in metrics); until then, correlate with sighting UI “blockers” column.