AWS S3 (Simple Storage Service)¶

Scope¶

AWS object storage service. Covers storage classes, lifecycle policies, encryption (SSE-S3/SSE-KMS/SSE-C), replication (CRR/SRR), Object Lock, Access Points, versioning, event notifications, Express One Zone, Conditional Writes, and S3 Tables.

Checklist¶

Why This Matters¶

S3 is the foundational storage service in AWS used by nearly every workload -- application assets, data lake storage, backups, log archives, static websites, and ML training data. Its durability (99.999999999%, eleven 9s) and availability (99.99% for Standard) make it reliable, but misconfigured buckets remain one of the most common sources of data breaches in cloud environments. Public bucket exposure, missing encryption, and absent lifecycle policies lead to security incidents and uncontrolled cost growth.

Storage class selection directly impacts cost: Standard is $0.023/GB/month, Infrequent Access is $0.0125/GB, Glacier Instant Retrieval is $0.004/GB, Glacier Flexible Retrieval is $0.0036/GB, and Deep Archive is $0.00099/GB. For large data sets, proper lifecycle policies can reduce storage costs by 60-80%. However, retrieval fees and minimum storage duration charges apply to colder classes, so access patterns must be understood before transitioning.

Common Decisions (ADR Triggers)¶

Storage class strategy -- Standard for frequently accessed data, Intelligent-Tiering for unpredictable access patterns (automatically moves objects between tiers, $0.0025/1K objects monitoring fee), IA for infrequent but rapid access, Glacier for archival with retrieval needs (minutes to hours), Deep Archive for long-term retention rarely accessed (12-48 hour retrieval). Glacier Instant Retrieval bridges the gap with millisecond access at archive pricing.
Encryption approach: SSE-S3 vs SSE-KMS -- SSE-S3 is the default, free, and sufficient for most workloads. SSE-KMS enables key rotation control, key usage audit via CloudTrail, cross-account key sharing via key policies, and compliance requirements mandating customer-managed keys. SSE-KMS adds API call costs and is subject to KMS request rate limits (5,500-30,000 requests/second depending on region).
Replication strategy: CRR vs SRR vs no replication -- CRR for disaster recovery across regions (RPO depends on replication lag, typically seconds to minutes), regulatory compliance requiring data in multiple jurisdictions, or latency optimization for global users. SRR for aggregating logs from multiple accounts, maintaining a production replica in the same region, or compliance requiring data copies within the same jurisdiction.
Event-driven processing: S3 Events vs EventBridge -- S3 Event Notifications are simpler with direct Lambda/SQS/SNS integration. EventBridge provides content-based filtering on object metadata, routing to 20+ targets, archive/replay of events, and schema registry. Use EventBridge for complex event routing or when multiple consumers need different subsets of S3 events.
Object Lock mode: Governance vs Compliance -- Governance mode allows users with s3:BypassGovernanceRetention permission to override the lock; suitable for internal policies. Compliance mode is truly immutable -- no user, including root, can delete or modify the object until the retention period expires. Required for SEC 17a-4, FINRA, and similar regulations. Compliance mode decisions are irreversible.
Single bucket with prefixes vs multiple buckets -- Single bucket with prefixes is simpler to manage and allows shared policies. Multiple buckets provide stronger isolation, independent configurations (versioning, lifecycle, replication), and separate billing. Use multiple buckets for different environments (dev/staging/prod), different compliance requirements, or different teams with distinct access controls.

Reference Architectures¶

Static Website Hosting¶

S3 bucket with static website hosting enabled -> CloudFront distribution (OAC for private bucket access) -> Route 53 alias record. ACM certificate for HTTPS. CloudFront Functions for URL rewrites (index.html for SPA routing). Separate bucket for access logs. CI/CD pipeline deploys to S3 with CloudFront invalidation.

Data Lake Foundation¶

Raw zone (S3 Standard, versioning enabled) -> processing (Lambda/Glue triggered by S3 events) -> curated zone (S3 Standard, Parquet/ORC format) -> archive zone (Glacier). AWS Glue Data Catalog for schema. Athena for ad-hoc queries. Lake Formation for fine-grained access control. S3 Inventory for asset tracking. Lifecycle policies transition raw data to IA after 90 days.

Backup and Compliance Archive¶

S3 bucket with versioning, Object Lock (Compliance mode), SSE-KMS encryption. CRR to a separate account in another region for ransomware protection. Lifecycle policy: retain current versions for 7 years, transition non-current versions to Glacier after 30 days. S3 Inventory reports for compliance auditing. CloudTrail data events for access logging.

Multi-Tenant Application Storage¶

One S3 bucket per tenant with consistent naming convention. S3 Access Points per tenant with VPC-restricted policies. IAM session policies for row-level (prefix-level) isolation. Pre-signed URLs for temporary direct client access (upload/download). CloudFront with signed URLs/cookies for content delivery. Transfer Acceleration for global tenants uploading large files.

Reference Links¶

Amazon S3 User Guide -- buckets, objects, access control, encryption, replication, and lifecycle management
Amazon S3 Pricing -- storage class pricing, request costs, data transfer, and retrieval fees
Amazon S3 Storage Classes -- Standard, Intelligent-Tiering, IA, Glacier, and Deep Archive comparison