Skip to content

SolarWinds

Scope

This file covers SolarWinds monitoring platform including the Orion platform and its core modules (NPM for network performance, SAM for server and application monitoring, VMAN for virtualization management, NCM for network configuration management), polling engine sizing and distributed architecture, monitoring protocols (SNMP, WMI, agent-based), alerting and reporting configuration, SolarWinds Observability SaaS platform (cloud-native successor), licensing model (per-node, per-element, subscription), and migration from Orion to SolarWinds Observability. For general observability patterns, see general/observability.md.

Checklist

  • [Critical] Is the Orion platform sized appropriately -- primary polling engine capacity (up to 12,000 elements per engine), Additional Polling Engines (APE) for distributed monitoring or scale-out, Additional Web Servers for dashboard load distribution, and SQL Server database sizing (Standard for small deployments, Enterprise for large) with appropriate disk I/O?
  • [Critical] Are monitoring protocols selected appropriately per device type -- SNMPv3 (encrypted, authenticated) for network devices, WMI for Windows servers (with dedicated service account and minimal permissions), SolarWinds Agent for deeper application monitoring and servers behind firewalls, and ICMP as a baseline availability check?
  • [Critical] Is the node and element licensing understood -- NPM licenses per monitored node (interface counts may require additional element licenses), SAM licenses per component monitor, VMAN licenses per managed VM host -- and is license utilization tracked to avoid unmonitored assets or overspend?
  • [Critical] Are SNMP community strings and WMI credentials managed securely -- stored in SolarWinds credential library with access restricted to admin users, SNMPv3 preferred over v2c, and WMI service accounts following least-privilege principles (no domain admin)?
  • [Recommended] Are polling intervals tuned per metric criticality -- 1-2 minute intervals for critical infrastructure metrics (interface utilization, CPU, memory), 5-10 minute intervals for non-critical devices, and statistics collection intervals set appropriately to balance database growth with reporting granularity?
  • [Recommended] Are Additional Polling Engines (APE) deployed for remote sites, network segments, or scale-out, with each APE sized for its node assignment, firewall rules configured for Orion-to-APE communication (port 17777 TCP), and failover polling configured for engine redundancy?
  • [Recommended] Is alerting configured with appropriate conditions, suppression, and escalation -- alert conditions using sustained thresholds (not single-poll spikes), time-of-day awareness (reduced alerting during maintenance windows), dependency-based suppression (suppress child alerts when parent node is down), and escalation to ticketing systems after acknowledgment timeout?
  • [Recommended] Is NCM (Network Configuration Management) configured for automated config backups, compliance baselines, change detection alerts, and config comparison -- providing audit trail for network device configuration changes and reducing mean-time-to-recovery for misconfigurations?
  • [Recommended] Has SolarWinds Observability SaaS been evaluated as a migration target -- providing cloud-native monitoring without Orion infrastructure management, with entity-based pricing, native cloud integration (AWS, Azure, GCP), and distributed tracing, but with different feature coverage than Orion modules?
  • [Optional] Are custom reports configured for capacity planning (trending CPU, memory, disk utilization over 90 days), SLA compliance (uptime percentages per service tier), and inventory reporting (hardware models, firmware versions, end-of-life tracking)?
  • [Optional] Is the SolarWinds API (SWIS - SolarWinds Information Service) used for automation -- bulk node onboarding from CMDB exports, custom property population, and report data extraction for external dashboards or ITSM integration?
  • [Optional] Is database maintenance automated -- including SQL Server index maintenance, statistics updates, Orion database grooming settings (retention periods per data type), and transaction log management to prevent database growth from impacting monitoring performance?
  • [Optional] Are SolarWinds AI-powered troubleshooting features evaluated -- anomaly detection and root cause analysis that correlate across network, server, and application metrics to suggest probable causes?
  • [Recommended] Is the SolarWinds patching strategy defined given the platform's security history (2020 supply chain incident) -- including timely hotfix application, network segmentation of Orion servers, restricted administrative access, and monitoring of SolarWinds security advisories?

Why This Matters

SolarWinds Orion is one of the most widely deployed network and infrastructure monitoring platforms in mid-market and enterprise environments, particularly strong in network operations where its NPM module provides comprehensive SNMP-based monitoring with automated network topology mapping. The platform's module-based architecture (NPM, SAM, VMAN, NCM) allows organizations to incrementally add monitoring capabilities, but each module adds licensing cost and polling load that must be planned for. Polling engine sizing is the most common architectural mistake: overloaded polling engines miss data collection cycles, creating gaps in monitoring data and delayed alerting during exactly the moments when monitoring is most critical.

The SolarWinds platform is at an inflection point. The Orion on-premises platform remains widely deployed but is being superseded by SolarWinds Observability, a SaaS platform with modern architecture and entity-based pricing. Organizations must decide whether to continue investing in Orion infrastructure or migrate to the SaaS platform, accepting feature differences and connectivity requirements. The 2020 supply chain security incident has also made SolarWinds deployment security a mandatory consideration -- Orion servers have privileged network access for monitoring and must be treated as high-value targets requiring network segmentation, hardened configurations, and prompt patching.

Common Decisions (ADR Triggers)

  • SolarWinds Orion vs SolarWinds Observability SaaS -- Orion provides mature, feature-rich on-premises monitoring with deep SNMP/WMI capabilities, extensive alerting, and decades of template development. Observability SaaS eliminates infrastructure management, provides native cloud integration, and offers modern distributed tracing, but lacks some Orion module depth (NCM equivalent is limited) and requires internet connectivity for all monitored endpoints. Choose Orion for air-gapped environments, complex network monitoring, and organizations with existing Orion investment; Observability SaaS for greenfield deployments, cloud-centric environments, and organizations wanting to eliminate on-premises monitoring infrastructure.
  • SolarWinds vs open-source (Zabbix, Prometheus + Grafana) -- SolarWinds provides a polished UI, automated network discovery, extensive out-of-box templates, and commercial support, but carries significant licensing costs ($10K-$100K+ depending on modules and node count). Zabbix provides comparable infrastructure monitoring at zero licensing cost but requires more configuration effort and lacks SolarWinds' network topology visualization. Prometheus excels in cloud-native environments but is not suited for traditional SNMP network monitoring. Choose SolarWinds when network monitoring depth and vendor support justify the cost; open-source when budget is the primary constraint and in-house expertise is available.
  • Per-node vs subscription vs entity-based licensing -- Traditional Orion licensing is perpetual per-node (NPM: $1,500-$50,000+ depending on node tier, plus ~20% annual maintenance). Subscription licensing provides annual rights without perpetual ownership. SolarWinds Observability uses entity-based pricing (per monitored entity/month). Evaluate total cost of ownership over 3-5 years: perpetual licensing favors stable environments; subscription favors growing environments; entity-based favors cloud environments with variable scale.
  • Single polling engine vs distributed -- A single Orion server handles up to ~12,000 elements (polled metrics) before performance degrades. Organizations exceeding this or monitoring across WAN links need Additional Polling Engines. Each APE requires its own Windows Server infrastructure and SQL connectivity. Distributed architecture adds operational complexity but is mandatory for large or geographically distributed environments.
  • Orion SAM vs dedicated APM -- SAM provides basic application monitoring (process, service, log, URL monitoring, AppInsight for SQL/IIS/Exchange) but lacks distributed tracing, code-level profiling, and service dependency mapping found in dedicated APM tools (Datadog, Dynatrace, New Relic). Use SAM for infrastructure-centric application checks (is the service running, is the port responding); dedicated APM for application performance analysis and microservice environments.
  • AI-powered troubleshooting adoption -- SolarWinds Observability (SaaS) AI features for anomaly detection and root cause analysis vs Orion platform's more limited baseline deviation alerting; organizations with advanced AI requirements may need a more AI-capable platform.

AI and GenAI Capabilities

AI-Powered Troubleshooting — SolarWinds Observability (SaaS) includes ML-based anomaly detection across infrastructure metrics and AI-assisted root cause analysis. The Orion platform (on-premises) has more limited AI features focused on baseline deviation alerting. AI capabilities are primarily in the SaaS offering, not the legacy Orion platform.

Note: SolarWinds' AI capabilities are less mature than Datadog or Splunk. Organizations evaluating SolarWinds should assess whether the AI features meet their automation requirements or whether a more AI-capable platform is needed.

See Also

  • general/observability.md -- general observability architecture patterns and monitoring strategy