Ceph Storage¶

Scope¶

Ceph distributed storage: cluster sizing, CRUSH map design, BlueStore, replication and erasure coding, OSD WAL/DB placement, placement groups, network separation, MDS for CephFS, RGW for S3/Swift, monitoring (Dashboard, Prometheus module), and upgrade planning.

Ceph is a distributed, software-defined storage platform providing block (RBD), object (RGW/S3), and file (CephFS) storage from a single cluster. Used as the storage backend for OpenStack (Cinder, Glance, Manila), OpenShift Data Foundation (ODF/Rook), Proxmox, and standalone deployments.

Checklist¶

Why This Matters¶

Ceph is the de facto standard for open-source distributed storage. It underpins OpenStack clouds, OpenShift container storage (ODF), and many enterprise storage platforms. Design decisions at deployment time — CRUSH map, pool replication strategy, network topology — are extremely difficult to change later. A poorly designed CRUSH map leads to uneven data distribution and hotspots. Undersized PG counts cause data imbalance that worsens as the cluster grows. Missing network separation causes replication traffic to starve client I/O during recovery events.

Ceph recovery after an OSD failure is I/O intensive — the cluster rebalances data across remaining OSDs. If the cluster is near capacity (>80%), recovery may not complete before the next failure, risking data loss. Capacity planning must account for failure recovery headroom, not just raw storage needs.

Common Decisions (ADR Triggers)¶

Deployment tool — cephadm (official, container-based, Octopus+) vs Rook (Kubernetes operator, used by ODF) vs manual (legacy) — cephadm for standalone, Rook for K8s-integrated
Monitoring stack: built-in vs centralized — cephadm deploys Prometheus, Grafana, Alertmanager, and Node Exporter as containers by default (skip with --skip-monitoring-stack at bootstrap). When a centralized observability stack already exists (common in enterprise environments), running Ceph's built-in stack creates duplicate infrastructure and dashboard fragmentation. Options: (1) disable Ceph's monitoring containers and scrape ceph-exporter from the central Prometheus, importing Ceph dashboards into the central Grafana; (2) keep Ceph's stack isolated for storage team autonomy; (3) federate Ceph's Prometheus into the central instance. Rook deployments expose ServiceMonitor CRDs for Prometheus Operator — no built-in stack to manage. See also: Prometheus/Grafana observability
Replication vs erasure coding — 3x replication (simple, fast reads, 3x raw cost) vs EC 4+2 (1.5x raw cost, higher write latency, no partial overwrites for RBD) — use replication for RBD/hot data, EC for RGW/cold data
All-flash vs hybrid — NVMe/SSD-only (high IOPS, predictable latency) vs HDD with NVMe WAL/DB (high capacity, lower cost, variable latency) — depends on workload IOPS requirements
CephFS vs RGW vs RBD — block (RBD for VMs/containers), object (RGW for S3-compatible), file (CephFS for shared POSIX) — often all three from one cluster
Single cluster vs multi-site — single cluster with rack-level failure domains vs multi-site with RGW multi-site or RBD mirroring — latency between sites determines sync vs async replication
Dedicated OSD nodes vs converged — dedicated storage nodes (better performance isolation) vs converged with compute (lower cost, HCI model like Nutanix/Proxmox) — depends on scale and performance requirements

Version Notes¶

Feature	Luminous (12)	Mimic (13)	Nautilus (14)	Octopus (15)	Pacific (16)	Quincy (17)	Reef (18)	Squid (19)
BlueStore default	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
cephadm orchestrator	—	—	—	GA	GA	GA	GA	GA
PG autoscaling	—	—	GA	GA	GA	GA	GA	GA
Dashboard	Basic	Improved	Full	Full	Full	Full	Full	Full
RBD snapshot-based mirroring	—	—	GA	GA	GA	GA	GA	GA
Stretch clusters	—	—	—	—	GA	GA	GA	GA
msgr2 (v2 protocol)	—	—	GA	GA	GA	GA	GA	GA
RGW multi-site sync	GA	GA	GA	GA	Improved	Improved	Improved	Improved
CephFS multi-active MDS	Preview	Preview	GA	GA	GA	GA	GA	GA
Quincy LTS	—	—	—	—	—	LTS	—	—
Prometheus module	Preview	GA	GA	GA	GA	GA	GA	GA

Monitoring Configuration¶

Cephadm Built-in Grafana¶

Cephadm deploys the full monitoring stack (Prometheus, Grafana, Alertmanager, Node Exporter) automatically at bootstrap. If skipped with --skip-monitoring-stack, deploy components individually at any time:

ceph orch apply prometheus
ceph orch apply grafana
ceph orch apply alertmanager
ceph orch apply node-exporter

To reconfigure Grafana specifically:

ceph orch apply grafana

Service spec (grafana.yaml):

service_type: grafana
placement:
  count: 1
spec:
  port: 4200
  protocol: https
  initial_admin_password: <password>
  anonymous_access: False

TLS is managed by cephadm's certificate manager by default. For custom certificates:

ceph orch certmgr cert set --cert-name grafana_ssl_cert --hostname <host> -i certificate.pem
ceph orch certmgr key set --key-name grafana_ssl_key --hostname <host> -i key.pem
ceph orch reconfig grafana

Enable TLS and authentication across all monitoring components:

ceph config set mgr mgr/cephadm/secure_monitoring_stack true

Dashboard integration is automatic. If Grafana is in a different DNS zone from users:

ceph dashboard set-grafana-api-url <backend-grafana-url>
ceph dashboard set-grafana-frontend-api-url <browser-accessible-url>
ceph dashboard set-grafana-api-ssl-verify False  # for self-signed certs

External Prometheus/Grafana Integration¶

To scrape Ceph from a centralized Prometheus instead of using the built-in stack:

Enable the Prometheus module: ceph mgr module enable prometheus
Configure the external Prometheus to scrape ceph-exporter using cephadm's service discovery:

- job_name: 'ceph-exporter'
  http_sd_configs:
  - url: https://<mgr-ip>:8765/sd/prometheus/sd-config?service=ceph-exporter
    basic_auth:
      username: '<username>'
      password: '<password>'
    tls_config:
      ca_file: '/path/to/ca.crt'

Import Ceph Grafana dashboards from the ceph-mixin dashboards into the centralized Grafana.
Configure "Dashboard1" as the Prometheus data source name in Grafana (required by Ceph dashboard JSON).

Reference Architectures¶

Ceph Documentation — Architecture — official architecture overview covering RADOS, CRUSH, and client protocols
Ceph Hardware Recommendations — official sizing guidance for OSD, MON, MDS, and RGW nodes
Red Hat Ceph Storage Architecture Guide — enterprise deployment patterns and best practices
Rook Ceph Operator — Kubernetes-native Ceph deployment via Rook (used by ODF)
Ceph Monitoring Services (cephadm) — deploying and configuring Prometheus, Grafana, and Alertmanager via cephadm
Ceph Dashboard — Grafana — configuring Grafana integration with the Ceph Dashboard UI