Nutanix Prism Central at Scale¶

Scope¶

Prism Central (PC) deployment sizing, scale-out architecture, multi-site management patterns, high availability, disaster recovery, and feature scaling considerations for environments managing 10,000+ VMs across multiple clusters and sites.

Checklist¶

Why This Matters¶

Prism Central is the single management plane for all Nutanix operations beyond a single cluster. At scale (10,000+ VMs, 5+ sites), PC sizing and architecture decisions have cascading effects: an undersized PC degrades API response times, delays entity synchronization across clusters, slows report generation, and can cause Flow policy propagation failures. The choice between a single centralized PC and multiple regional PC instances determines whether administrators get unified policy enforcement and reporting (single PC) or regional autonomy and WAN independence (multiple PCs). Scale-out from single-VM to 3-VM is a one-way operation that cannot be reversed, making it a critical early architectural decision. PC backup and DR are frequently overlooked -- if Prism Central is lost without a current backup, all Calm blueprints, Flow policies, Leap protection domains, category assignments, RBAC configurations, and historical metrics must be manually reconstructed. The upgrade path between sizing tiers is disruptive (new deployment + backup restore + cluster re-registration), so right-sizing at initial deployment avoids costly operational interruptions later.

Sizing Reference¶

Tier	vCPU per VM	RAM per VM	Disk per VM	Max Clusters	Max Hosts	Max VMs	Deployment
X-Small	4	18 GB	100 GB	5	50	500	Single VM only
Small	6	26 GB	500 GB	25	250	2,500	1-VM or 3-VM
Large	10	44 GB	2,500 GB	75	750	7,500	1-VM or 3-VM
X-Large	14	60 GB	2,500 GB	150	1,500	25,000	1-VM or 3-VM

Scale-out (3-VM) deployment doubles the VM management capacity of a given tier (e.g., Small 3-VM supports ~5,000 VMs, X-Large 3-VM supports ~25,000 VMs) and provides n+1 fault tolerance.

Feature resource overhead: Enabling optional features (Calm/Self-Service, Flow Network Security, Disaster Recovery/Leap, Intelligent Operations) automatically allocates additional vCPU and memory to the PC VM(s) beyond the base tier specifications. Plan for 20-30% overhead when multiple features are active.

Scale-Out Architecture¶

Scale-out deploys 3 PC VMs with shared-nothing microservices architecture; services are distributed across all 3 VMs with leader election for singleton services.
Anti-affinity rules should be configured to ensure PC VMs run on separate physical hosts within the hosting cluster for true HA.
Scale-out tolerates the failure of 1 PC VM (n+1). If 2 VMs fail, PC becomes unavailable. The failed VM is automatically respawned on another host if AHV Node HA is enabled.
Scale-out is a one-way operation: once expanded from 1 VM to 3 VMs, you cannot revert to a single-VM deployment.
All 3 PC VMs must reside on the same PE cluster; they cannot be split across clusters or sites.

Multi-Site Deployment Patterns¶

Single Centralized PC (Scale-Out)¶

Best for: up to 25,000 VMs, consistent policy enforcement, unified reporting.

All clusters across all sites register to one scale-out PC instance.
Provides single pane of glass for Flow policies, Calm blueprints, Leap DR, categories, and RBAC.
Requires reliable WAN connectivity (port 9440) from all remote sites to the PC hosting site.
WAN latency above 200 ms degrades UI responsiveness and API performance.
Single point of failure at the site level (mitigated by PC backup to remote clusters and S3).

Multiple Regional PCs¶

Best for: 25,000+ VMs, high WAN latency between sites, regulatory data sovereignty, regional autonomy requirements.

Each region or major site gets its own PC instance managing local clusters.
Nutanix Central (Service Central) provides aggregated dashboard across all PC instances.
Flow policies, Calm blueprints, and categories must be maintained independently per PC (no native cross-PC policy sync).
Leap DR between sites requires both source and target PCs to be configured as Availability Zones with each other.
Increases operational overhead: multiple upgrade cycles, multiple backup configurations, potential policy drift.

Decision Framework: 12,000+ VMs Across 5+ Sites¶

Factor	Single Centralized PC	Multiple Regional PCs
VM count under 25,000	Preferred	Unnecessary complexity
VM count over 25,000	Not supported	Required
WAN reliability > 99.9%	Viable	Not required
WAN latency < 100 ms	Viable	Not required
Latency 100-200 ms	Acceptable with caveats	Preferred
Latency > 200 ms	Not recommended	Required
Data sovereignty per region	Cannot satisfy	Satisfies
Unified Flow policies	Native	Requires manual sync
Leap DR across sites	Single config	Each PC pair configured separately
Operational team per region	Overkill	Good fit

Backup and DR¶

Continuous Backup (IDF Replication)¶

Replicates Prism Central configuration database (Insights Data Fabric) to up to 3 registered PE clusters every 30 minutes.
RPO: 30 minutes. RTO: 2 hours.
Backup targets should be geographically distributed; all must run AOS 6.0+ (at least one at AOS 6.5.3.1+).
Replication traffic uses port 9440 (TCP).
NTP synchronization required between PC and all backup target clusters.

Point-in-Time Backup (S3)¶

Backs up to AWS S3 or Nutanix Objects (S3-compatible) for offsite/air-gapped protection.
RPO: 2 hours. RTO: 2 hours.
Retention: restorable for up to 1 month.
Recommended for ransomware protection and site-loss scenarios.

What Gets Backed Up¶

Intelligent Operations configurations, Flow Virtual Networking, Flow Network Security policies, Nutanix Disaster Recovery (Leap) configurations, access policies, categories, virtual networks, IAMv2 policies, and 90 days of metrics.

What Does NOT Get Backed Up¶

NCM Self-Service (Calm) blueprints and marketplace items, Catalog, Images, VM Templates, and metrics older than 90 days. These must be version-controlled or backed up independently.

Restore Process¶

From any surviving backup PE cluster, deploy a new PC instance (downloads latest installer from Nutanix portal or uses dark-site bundle).
Restore from IDF backup; re-seeds configuration and metrics.
If original PC comes back online, it must be shut down or deleted before resuming operations on the restored instance to avoid split-brain.

Feature Scaling Considerations¶

Flow Network Security¶

Requires X-Large PC for Flow Virtual Networking features.
At 10,000+ VMs, carefully plan category-based policy groups to avoid overly broad rules that generate excessive rule tables on each AHV host.
Flow policies are pushed to AHV hosts via Open vSwitch (OVS) rules; thousands of fine-grained rules per host can impact network performance.
Test policy propagation time in staging before deploying at scale.

NCM Self-Service (Calm)¶

Calm blueprints and marketplace items are NOT included in PC backup; store blueprints in Git.
Large numbers of concurrent blueprint launches can strain PC API; stagger bulk deployments.
Calm audit logs grow significantly at scale; monitor PC disk utilization.

Leap (Disaster Recovery)¶

Protection domains and recovery plans are managed per PC; multi-PC environments require separate Leap configurations per PC pair.
At scale, large numbers of protection domains (100+) increase entity sync time and PC resource consumption.
Validate RPO adherence with NCC (Nutanix Cluster Check) health checks.

Reporting and Analytics¶

Report generation on 10,000+ entities is resource-intensive; schedule reports during off-peak hours.
Entity sync interval between PE clusters and PC is typically 1-5 minutes; during network disruptions, sync backlog can temporarily show stale data.
Custom dashboards with many widgets querying large entity sets can cause UI slowness.

API and Automation¶

PC API (v3/v4) handles concurrent requests but can throttle under heavy load from multiple automation tools.
Implement retry logic with exponential backoff in all API clients.
Use batch/list APIs with pagination (limit/offset) rather than fetching all entities in a single call.
Monitor PC API response times via Prism Central metrics; sustained p99 latency above 5 seconds indicates capacity pressure.

Upgrade Path¶

From	To	Method
X-Small	Small/Large/X-Large	Contact Nutanix Support for assisted resize
Small (1-VM)	Small (3-VM)	Scale-out via PC UI (adds 2 VMs, one-way)
Small (1-VM)	Large/X-Large	Deploy new PC at target size, restore from backup, re-register clusters
Large (1-VM)	Large (3-VM)	Scale-out via PC UI (adds 2 VMs, one-way)
Large (1-VM)	X-Large	Deploy new PC at target size, restore from backup, re-register clusters
Any (3-VM)	Larger tier (3-VM)	Deploy new PC at target size, restore from backup, re-register clusters
Any	X-Small	Not supported (downgrade)

Key constraint: In-place tier changes (e.g., Small to Large) are not supported natively. The process requires deploying a new PC instance at the desired size, restoring configuration from backup, and re-registering all PE clusters. Plan for a maintenance window of 2-4 hours depending on environment size.

Networking Requirements¶

Management network: All PC VMs need IP addresses on the management network with Layer 3 reachability to all managed PE clusters (port 9440 TCP/HTTPS).
iSCSI data services network: Required if PC is hosted on a cluster using iSCSI-based storage (Volume Groups). Not required for standard AHV deployments.
DNS: Forward and reverse DNS records for all PC VMs. PC uses FQDN for cluster registration.
NTP: Synchronized time across PC and all PE clusters (critical for backup replication and Leap DR).
Firewall rules: Port 9440 (HTTPS/API), port 80/443 (LCM updates), port 8443 (MSP/microservices platform). See Nutanix Port Reference for complete list.
Bandwidth: For multi-site with centralized PC, minimum 10 Mbps sustained between PC and remote PE clusters; 100 Mbps recommended for environments with 50+ remote clusters.

Common Decisions (ADR Triggers)¶

PC sizing tier -- Small (2,500 VMs) vs Large (7,500 VMs) vs X-Large (25,000 VMs) based on current count, growth, and feature requirements (Flow Virtual Networking forces X-Large)
Single-VM vs scale-out -- Single VM (simpler, lower resource cost) vs 3-VM scale-out (HA, doubled capacity, one-way commitment)
Centralized vs distributed PC -- Single scale-out PC (unified management, WAN-dependent) vs multiple regional PCs (autonomous, operational overhead, requires Nutanix Central for aggregation)
PC hosting location -- Dedicated management cluster (isolation, predictable resources) vs co-hosted on production cluster (fewer clusters to manage, resource contention risk)
Backup strategy -- Continuous-only to PE clusters (simpler, 30-min RPO) vs continuous + point-in-time to S3 (defense-in-depth, ransomware protection, offsite)
Calm blueprint management -- Git-based version control (recommended, survives PC loss) vs PC-only storage (lost if PC is destroyed without separate backup)
Category taxonomy -- Pre-defined enforced taxonomy (consistent, scalable) vs organic growth (flexible, policy drift at scale)

Reference Links¶

Prism Central sizing guide -- VM sizing, scale limits, and deployment recommendations for Prism Central
Prism Central administration guide -- deployment, configuration, and multi-cluster management