Sysmon Profiles
Sysmon Profiles provide centralized management for system monitoring configuration across your ServiceRadar agents. Instead of manually configuring each agent, you can create profiles and assign them to devices or device groups using tags.
Overview
The System Monitoring (sysmon) feature collects host metrics from your agents:
- CPU: Usage percentage, load averages, per-core statistics
- Memory: Total, used, available, swap usage
- Disk: Usage per mount point, read/write I/O
- Network: Interface statistics, bytes in/out
- Processes: Top processes by CPU/memory usage
Sysmon Profiles let you control:
- Which metrics are collected
- How frequently samples are taken
- Which disk paths to monitor
- Alert thresholds for each metric type
Accessing Sysmon Profiles
Navigate to Settings > Sysmon Profiles in the web UI.
Profile Management
Creating a Profile
-
Click Create Profile
-
Fill in the profile settings:
- Name: A descriptive name (e.g., "Production Servers", "Database Hosts")
- Sample Interval: How often to collect metrics (e.g., "10s", "30s", "1m")
- Enabled Metrics: Select which metrics to collect:
- CPU metrics
- Memory metrics
- Disk metrics
- Network metrics
- Process list
- Disk Paths: Specify which mount points to monitor (e.g.,
/,/var,/data) - Thresholds: Set warning and critical thresholds for alerts
-
Review the JSON preview to see the compiled configuration
-
Click Save
Editing a Profile
- Click on a profile name or the edit icon
- Modify the settings
- Review changes in the JSON preview
- Click Save
Deleting a Profile
- Click the delete icon on the profile row
- Confirm deletion
If a deleted profile was the only match for a device, that device becomes unassigned and sysmon collection is disabled until another profile matches.
Profile Targeting (SRQL)
Profiles apply to devices based on their SRQL target query. When multiple profiles match, higher priority values win.
Example targeting queries:
in:devices tags.role:database- Match devices with role=database tagin:devices hostname:prod-*- Match devices with hostname prefix "prod-"in:devices type:Server- Match devices of type Server
Device Integration
Viewing Effective Profile
On the device detail page, the System Monitoring section shows:
- Effective Profile: The profile currently in use
- Assignment Source: How the profile was applied (SRQL or unassigned)
- Config Source: Whether the agent is using remote config or local override
Local Override Badge
If an agent is using a local configuration file instead of the centrally managed profile, a "Local Override" badge appears. This indicates:
- The agent has a
sysmon.jsonfile in its config directory - Local configuration takes precedence over remote profiles
- The device is opted-out of centralized management
SRQL Filtering
You can filter devices by sysmon profile and config source using SRQL:
# Find devices using a specific profile
sysmon_profile_id:abc123
# Find devices with local config override
config_source:local
# Find devices using remote config
config_source:remote
# Combine with other filters
type:Server AND config_source:local
Configuration Resolution
When an agent requests its sysmon configuration, ServiceRadar resolves it in this order:
-
Local config file (highest priority)
- Linux:
/etc/serviceradar/sysmon.json - macOS:
/usr/local/etc/serviceradar/sysmon.json
- Linux:
-
SRQL targeting
- Profiles with
target_queryevaluated by priority (highest first)
- Profiles with
-
No match
- Sysmon config is disabled until a profile matches
Profile Settings Reference
| Setting | Description | Example |
|---|---|---|
enabled | Whether sysmon collection is active | true |
sample_interval | How often to collect metrics | "10s", "1m" |
collect_cpu | Collect CPU metrics | true |
collect_memory | Collect memory metrics | true |
collect_disk | Collect disk metrics | true |
collect_network | Collect network interface metrics | false |
collect_processes | Collect process list | false |
disk_paths | Mount points to monitor | ["/", "/var", "/data"] |
thresholds.cpu_warning | CPU warning threshold (%) | "75" |
thresholds.cpu_critical | CPU critical threshold (%) | "90" |
thresholds.memory_warning | Memory warning threshold (%) | "80" |
thresholds.memory_critical | Memory critical threshold (%) | "95" |
Best Practices
-
Create a baseline profile - Use a catch-all SRQL query (e.g.,
in:devices) if you want default monitoring -
Use tags for scalability - Instead of assigning profiles to individual devices, use tags:
environment:production→ High-frequency monitoringrole:database→ Include disk I/O metricstier:frontend→ Skip process collection
-
Set appropriate intervals:
- Production critical systems: 5-10 seconds
- Standard servers: 30 seconds
- Development/staging: 60 seconds
-
Monitor only what you need - Disable unnecessary collectors:
- Disable process collection if you don't need it (reduces payload size)
- Disable network metrics if you're using dedicated network monitoring
-
Use local override sparingly - Local config files:
- Are harder to audit and manage at scale
- Should be reserved for air-gapped networks or compliance requirements
- Take precedence over any centralized configuration
Troubleshooting
Profile Not Applied
- Check the device's effective profile in the detail page
- Verify the profile SRQL query matches the device
- Check if a local config file exists (shows "Local Override" badge)
- Verify the agent has the
sysmoncapability
Metrics Not Appearing
- Confirm the profile has
enabled: true - Check that the specific collector is enabled (e.g.,
collect_cpu: true) - Verify the agent is connected and reporting status
- Check agent logs for sysmon-related errors
Process Metrics Missing
- Confirm the profile has
collect_processes: trueand the agent has reloaded its config - Run an SRQL query to confirm data is present:
in:process_metrics device_id:"<device_uid>" time:last_24h sort:timestamp:desc limit:10
- Verify rows exist directly in CNPG:
SELECT timestamp, pid, name, cpu_usage, memory_usage
FROM process_metrics
WHERE device_id = '<device_uid>'
ORDER BY timestamp DESC
LIMIT 20;
Config Changes Not Propagating
Agents check for configuration updates every 5 minutes (with jitter). To force an update:
- Restart the agent
- Or wait for the next refresh cycle (up to ~5.5 minutes)