Cloud Governance
Scope
This file covers organizational governance practices for cloud environments: tagging, naming, account structure, FinOps, policy-as-code, and guardrails. For cost optimization specifics, see general/cost.md. For security controls, see general/security.md.
Checklist
[Critical] Are mandatory tags defined and enforced? (owner, environment, cost-center, project at minimum)
[Critical] Is there a resource naming convention? (documented, consistent across providers, enforced via policy)
[Critical] Is the account/subscription/project structure defined? (landing zones, organizational units, separation of concerns)
[Recommended] Is policy-as-code implemented? (OPA/Gatekeeper, HashiCorp Sentinel, Azure Policy, AWS SCPs)
[Critical] Are Service Control Policies or Organization Policies restricting dangerous actions? (prevent disabling logging, public S3 buckets, unapproved regions)
[Critical] Are budget alerts and spending controls in place? (per account, per team, per project)
[Recommended] Is there a FinOps practice? (cost visibility, allocation, optimization cadence)
[Optional] Is there a Cloud Center of Excellence or platform team? (standards, enablement, shared services)
[Recommended] Are guardrails automated rather than gate-based? (prevent vs approve — guardrails scale, gates do not)
[Optional] Is there a service catalog of approved architectures? (pre-approved patterns, self-service provisioning)
[Recommended] Is there a process for requesting exceptions to governance policies?
[Recommended] Are resource lifecycle policies defined? (TTL for dev environments, cleanup automation)
Why This Matters
Without governance, cloud environments become ungovernable within months. Untagged resources make cost allocation impossible — finance cannot attribute spend to teams or projects. Missing naming conventions lead to confusion and accidental deletions. Flat account structures create blast radius problems where one team's misconfiguration affects everyone.
The most damaging governance failure is shadow IT at scale : teams provisioning resources without standards, creating security gaps, cost surprises, and compliance violations that compound over time. Governance is not bureaucracy — it is the operating system for cloud at scale.
Tagging Standards
Tag Key
Purpose
Example Values
owner
Team or individual responsible
platform-team, jane.doe@company.com
environment
Deployment stage
production, staging, development, sandbox
cost-center
Financial allocation
engineering-1234, marketing-5678
project
Business project or product
checkout-service, data-pipeline-v2
Tag Key
Purpose
Example Values
managed-by
IaC tool that manages the resource
terraform, cloudformation, pulumi
data-classification
Sensitivity level
public, internal, confidential, restricted
compliance
Applicable compliance framework
hipaa, pci, sox
ttl
Expected resource lifetime
2025-12-31, ephemeral, permanent
backup
Backup policy
daily, weekly, none
Tag Enforcement
Provider
Enforcement Mechanism
Capability
AWS
SCP + AWS Config Rules + Tag Policies
Prevent untagged resource creation, auto-remediate
Azure
Azure Policy (deny/append/audit)
Deny resource creation without required tags, inherit tags
GCP
Organization Policy + Labels
Audit label presence, restrict resource creation
Resource Naming Conventions
Recommended Pattern
{provider}-{environment}-{region}-{project}-{resource-type}-{identifier}
Examples
Resource
Name
AWS VPC
aws-prod-use1-checkout-vpc-main
Azure Resource Group
az-prod-eus-checkout-rg
GCP GKE Cluster
gcp-prod-usc1-platform-gke-primary
S3 Bucket
aws-prod-use1-checkout-data-lake
Naming Rules
Lowercase only (avoid case-sensitivity issues across providers)
Hyphens as separators (underscores cause issues in DNS names)
No personal names or temporary designations (test-123, johns-bucket)
Include environment to prevent accidental cross-environment operations
Keep under 63 characters (DNS label limit)
Account / Subscription / Project Structure
Landing Zone Pattern
Organization Root
├── Security OU
│ ├── Log Archive Account (centralized logging)
│ ├── Security Tooling Account (GuardDuty, Security Hub)
│ └── Audit Account (read-only cross-account access)
├── Infrastructure OU
│ ├── Network Hub Account (Transit Gateway, DNS)
│ ├── Shared Services Account (CI/CD, artifact repos)
│ └── Identity Account (SSO, directory services)
├── Workloads OU
│ ├── Production OU
│ │ ├── Team-A Production Account
│ │ └── Team-B Production Account
│ ├── Staging OU
│ │ ├── Team-A Staging Account
│ │ └── Team-B Staging Account
│ └── Development OU
│ ├── Team-A Development Account
│ └── Team-B Development Account
└── Sandbox OU
├── Developer Sandbox Accounts (auto-cleanup, spending cap)
└── Experimentation Accounts
Provider
Tool
What It Provides
AWS
Control Tower + Account Factory
Automated account provisioning, guardrails, SSO
Azure
Cloud Adoption Framework Landing Zones
Management groups, policy, deployment stacks (Azure Blueprints deprecated, replaced by Azure Deployment Stacks; verify current retirement date at docs.microsoft.com), Hub-spoke networking
GCP
Cloud Foundation Toolkit
Organization, folders, projects, shared VPC
Account Separation Principles
Production is always separate from non-production (blast radius isolation)
Security and logging accounts are separate and restricted (tamper-proof audit trail)
Sandbox accounts have spending caps and auto-cleanup (safe experimentation)
One workload per account is ideal; group only tightly coupled services
Networking hub centralizes connectivity (Transit Gateway, Hub VNet, Shared VPC)
Policy-as-Code
Tool
Scope
Language
Best For
OPA / Gatekeeper
Kubernetes, Terraform, CI/CD
Rego
K8s admission control, Terraform plan validation
HashiCorp Sentinel
Terraform Enterprise/Cloud
Sentinel
Terraform-native policy enforcement
AWS SCPs
AWS Organizations
JSON
Account-level permission boundaries
Azure Policy
Azure subscriptions
JSON
Resource compliance, auto-remediation
GCP Organization Policy
GCP organization/folders
Constraints
Resource restriction, location enforcement
Essential Policies to Implement
Deny public storage — No public S3 buckets, Azure blob containers, or GCS buckets
Require encryption — All storage and databases must use encryption at rest
Restrict regions — Resources only in approved regions (data sovereignty)
Require logging — CloudTrail, Activity Log, or Audit Log cannot be disabled
Enforce tagging — Resources without mandatory tags are denied
Restrict instance types — Prevent expensive instance types in dev/sandbox
Deny public IPs — Compute instances cannot have direct public IPs (use load balancers)
Require MFA — Privileged actions require multi-factor authentication
Guardrails vs Gates
Aspect
Guardrails
Gates
Mechanism
Automated prevention/detection
Manual approval/review
Speed
Instant (no human bottleneck)
Hours to days
Scalability
Scales to thousands of teams
Does not scale
Developer experience
Self-service within boundaries
Ticket-and-wait
When to use
Default for all standard controls
High-risk exceptions only
Prefer guardrails. Gates create bottlenecks and frustration. Guardrails let teams move fast within safe boundaries. Reserve gates for genuinely exceptional requests (new region, new compliance scope, production database schema changes).
FinOps Practices
FinOps Maturity Phases
Inform — Visibility into who is spending what (tagging, cost dashboards, allocation)
Optimize — Act on cost data (rightsizing, reserved instances, spot, waste elimination)
Operate — Continuous governance (budget alerts, anomaly detection, optimization cadence)
Key FinOps Activities
Activity
Frequency
Owner
Cost allocation review
Monthly
FinOps team + Finance
Rightsizing recommendations
Monthly
Engineering teams
Reserved instance / savings plan planning
Quarterly
FinOps team
Anomaly investigation
As alerted
Resource owner
Unused resource cleanup
Weekly (automated)
Platform team
Unit cost tracking (cost per transaction, per user)
Monthly
Product + Engineering
Budget Controls
Provider
Budget Tool
Alert Capabilities
AWS
AWS Budgets
Forecasted and actual spend, SNS/email alerts, auto-actions
Azure
Cost Management Budgets
Action groups, auto-shutdown, email alerts
GCP
Cloud Billing Budgets
Pub/Sub alerts, programmatic responses
Cloud Center of Excellence (CCoE)
A CCoE is a cross-functional team that establishes cloud standards and enables adoption. It is not a gate — it is a platform team.
CCoE Responsibilities
Define and maintain reference architectures (pre-approved, well-tested patterns)
Provide self-service infrastructure modules (Terraform modules, CloudFormation templates)
Run enablement programs (training, office hours, architecture reviews)
Manage shared services (CI/CD, observability, networking, security tooling)
Track cloud maturity across teams and drive improvement
CCoE Anti-Patterns
Becoming a bottleneck (approval-based instead of enablement-based)
Building ivory tower standards nobody follows
Not including practitioners from delivery teams
Focusing on control instead of capability
Common Decisions (ADR Triggers)
Tagging strategy — which tags are mandatory, enforcement mechanism, tag inheritance
Account structure — single vs multi-account, OU hierarchy, account provisioning process
Naming convention — pattern, abbreviations, uniqueness requirements
Policy-as-code tool — OPA vs Sentinel vs native provider policies
Guardrails vs gates — what requires automated prevention vs manual approval
FinOps model — centralized FinOps team vs embedded in engineering vs hybrid
Budget alert thresholds — percentage-based vs absolute, who gets notified
CCoE charter — scope, staffing model, relationship to security and platform teams
See Also
general/cost.md — Cost optimization techniques and pricing models
general/security.md — Security controls and compliance mapping
general/identity.md — IAM, SSO, and access management
compliance/soc2.md — SOC 2 governance controls (CC1)
general/compliance-automation.md — Automated compliance enforcement, scanning, and evidence collection
general/change-management.md — Change management practices, CAB processes, and ITSM integration