Change Management¶

Scope¶

This file covers change management practices for infrastructure and cloud environments: change classification and approval workflows, Change Advisory Board (CAB) processes, maintenance windows, change management integration with CI/CD pipelines, ITSM tooling, risk assessment, post-implementation review, change freezes, and compliance-driven change control. For deployment strategies and rollback procedures, see Deployment. For CI/CD pipeline design and GitOps, see CI/CD. For governance frameworks and policy-as-code, see Governance. For compliance automation and audit controls, see Compliance Automation.

Checklist¶

Why This Matters¶

Change is the leading cause of production incidents. Industry data consistently shows that 60-80% of outages are caused by changes — deployments, configuration updates, infrastructure modifications, and maintenance activities. An effective change management process does not eliminate risk but makes it visible, assessed, and managed before impact reaches customers.

The tension in change management is between control and velocity. Traditional ITIL-style change management with weekly CAB meetings and multi-day approval cycles was designed for an era of quarterly releases. Modern cloud environments with daily or hourly deployments cannot tolerate that overhead — but they still need risk assessment, audit trails, and rollback planning. The solution is not to abandon change management but to automate it: pre-approved standard changes for routine deployments, automated risk scoring that routes changes to the appropriate approval level, and CI/CD pipelines that create and close change records without human intervention for low-risk changes.

Organizations subject to compliance frameworks (SOX, PCI-DSS, FedRAMP, HIPAA) face additional pressure. Auditors expect documented change records with approval trails, separation of duties between change requestor and approver, and evidence that changes were tested before production deployment. Failing to produce this evidence during an audit creates findings that can escalate to material weaknesses. The most efficient approach is to build compliance evidence generation into the deployment pipeline itself — every deployment automatically produces the change record, approval evidence, test results, and deployment log that auditors need.

Common Decisions (ADR Triggers)¶

ADR: Change Classification and Approval Model¶

Context: The organization must balance change velocity with risk management by defining how changes are classified and what approval each type requires.

Options: - Traditional ITIL model: All changes go through CAB review. Provides maximum oversight but creates bottlenecks. Weekly CAB meetings become a scheduling constraint. Standard changes are pre-approved templates but creating new templates requires CAB approval. - Risk-tiered model: Automated risk scoring routes changes to the appropriate approval level. Low-risk changes (standard, pre-approved patterns) proceed with peer review only. Medium-risk changes require team lead approval. High-risk changes (cross-service, database schema, network) require CAB or architecture review. Emergency changes use a streamlined approval with mandatory PIR. - GitOps-native model: Pull request review and approval serves as the change record and approval. Branch protection rules enforce separation of duties. Automated tests and policy checks replace manual risk assessment. CAB is reserved for architectural changes only.

Decision drivers: Regulatory requirements (SOX, PCI, FedRAMP mandate documented approval trails), deployment frequency target, team size and on-call structure, and organizational risk tolerance.

ADR: ITSM Platform Integration Strategy¶

Context: Change records must be tracked in a system of record, and the choice of platform and integration depth affects both compliance posture and developer experience.

Options: - Full ITSM integration (ServiceNow, BMC Remedy): Bidirectional API integration between CI/CD and ITSM. Change records created automatically on deployment trigger, closed on successful completion. Provides comprehensive audit trail. High implementation complexity, vendor lock-in risk, and licensing cost. - Lightweight ITSM (Jira Service Management, Freshservice): Lower licensing cost, easier API integration, familiar to development teams already using Jira. May lack enterprise ITSM features (CMDB, dependency mapping, advanced workflow). - GitOps as ITSM: Git history and pull request metadata serve as the change record. Approval via PR review. No separate ITSM platform needed. Lowest friction but may not satisfy auditors who expect a dedicated change management system. Works well for cloud-native organizations without legacy ITSM requirements.

Decision drivers: Existing ITSM investment, auditor expectations, deployment volume (high-volume deployments make manual ITSM workflows impractical), and budget for licensing and integration development.

ADR: Change Freeze Policy¶

Context: The organization must define when changes are restricted and how exceptions are handled during freeze periods.

Options: - Hard freeze with no exceptions: No production changes during freeze windows. Simplest to enforce, eliminates change-related risk during critical periods. Can leave critical vulnerabilities unpatched and block incident remediation. - Hard freeze with emergency exception process: No routine changes, but emergency changes (security patches, production-down fixes) proceed with executive approval and enhanced monitoring. Requires clear definition of what constitutes an emergency to prevent abuse. - Soft freeze with enhanced review: Changes are permitted but require additional approval (manager + CAB chair) and enhanced monitoring during and after deployment. Maintains velocity for critical work but increases approval overhead.

Decision drivers: Business criticality of freeze periods (revenue impact of holiday outage vs. cost of delayed features), regulatory requirements (SOX fiscal close), historical incident rate during freeze-equivalent periods, and team availability during freeze windows.

ADR: Drift Detection and Remediation Strategy¶

Context: Infrastructure changes made outside the approved change process (console clicks, manual CLI commands, undocumented automation) create configuration drift that undermines change management.

Options: - Detect and alert: Monitor for drift using cloud-native tools (AWS Config, Azure Policy) or IaC state comparison (Terraform plan). Alert the responsible team and require remediation through the standard change process. Low friction but relies on team discipline. - Detect and auto-remediate: Automatically revert unauthorized changes to the declared state (Crossplane, AWS Config auto-remediation, GitOps reconciliation). Prevents drift accumulation but can cause unintended disruption if the declared state is itself incorrect or if emergency changes were made intentionally. - Prevent and enforce: Use IAM policies and service control policies to prevent console access and manual changes entirely. All changes must flow through IaC pipelines. Strictest control but requires mature IaC coverage and can block incident response if pipeline access is unavailable.

Decision drivers: IaC coverage maturity (cannot auto-remediate resources not managed by IaC), incident response requirements (operators may need console access during outages), compliance strictness, and team cloud maturity.