Skip to content

AWS S3 (Simple Storage Service)

Scope

AWS object storage service. Covers storage classes, lifecycle policies, encryption (SSE-S3/SSE-KMS/SSE-C), replication (CRR/SRR), Object Lock, Access Points, versioning, event notifications, Express One Zone, Conditional Writes, and S3 Tables.

Checklist

  • [Critical] Enable S3 Block Public Access at the account level and confirm it on every bucket; override only for intentionally public buckets (e.g., static website hosting) with explicit justification
  • [Critical] Choose bucket policy (resource-based, preferred for cross-account access and broad rules) vs IAM policy (identity-based, preferred for per-user/role permissions); avoid ACLs (legacy, disabled by default on new buckets since April 2023)
  • [Critical] Enable versioning on buckets containing important data; understand that versioning cannot be disabled once enabled (only suspended), and all versions consume storage and incur costs
  • [Recommended] Configure lifecycle policies to transition objects through storage classes (Standard -> IA after 30 days -> Glacier after 90 days -> Deep Archive after 365 days) and expire old versions/delete markers
  • [Critical] Select encryption strategy: SSE-S3 (default, free, AWS-managed keys), SSE-KMS (auditable via CloudTrail, supports key policies, $0.03/10K requests), or SSE-C (customer-provided keys, you manage key material)
  • [Recommended] Set up replication: Cross-Region Replication (CRR) for disaster recovery and compliance, or Same-Region Replication (SRR) for log aggregation and cross-account backup; requires versioning on both source and destination
  • [Optional] Configure S3 Object Lock for WORM (Write Once Read Many) compliance requirements: Governance mode (removable with special permissions) vs Compliance mode (immutable, not even root can delete until retention expires)
  • [Optional] Use S3 Access Points to simplify managing access for shared datasets; each access point has its own DNS name, access policy, and optional VPC restriction
  • [Optional] Enable S3 Transfer Acceleration for faster long-distance uploads using CloudFront edge locations; typically 50-500% faster for cross-continental transfers at additional $0.04-0.08/GB
  • [Recommended] Enable server access logging or CloudTrail S3 data events for audit trails; access logs are cheaper but less structured; CloudTrail data events provide richer detail at higher cost
  • [Recommended] Configure S3 Event Notifications to trigger Lambda, SQS, SNS, or EventBridge on object creation, deletion, or other events; EventBridge integration provides filtering and routing capabilities
  • [Optional] Use S3 Inventory reports for auditing encryption status, replication status, and storage class distribution across large buckets (daily or weekly CSV/ORC/Parquet reports)
  • [Optional] Evaluate S3 Select and S3 Object Lambda for server-side data filtering to reduce data transfer costs and latency for CSV, JSON, or Parquet objects
  • [Optional] Evaluate S3 Express One Zone for single-digit millisecond latency on frequently accessed data; uses a directory bucket type in a single AZ with S3 Express endpoints; ideal for ML training, financial modeling, and real-time analytics with 10x performance improvement over S3 Standard at higher per-request cost
  • [Optional] Use S3 Conditional Writes (If-None-Match, If-Match headers) for optimistic concurrency control on PutObject and CompleteMultipartUpload; prevents accidental overwrites without requiring external locking mechanisms like DynamoDB
  • [Optional] Evaluate S3 Tables for Apache Iceberg-compatible managed table storage; provides automatic compaction, snapshot management, and query optimization for analytics workloads accessed via Athena, EMR, or other Iceberg-compatible engines

Why This Matters

S3 is the foundational storage service in AWS used by nearly every workload -- application assets, data lake storage, backups, log archives, static websites, and ML training data. Its durability (99.999999999%, eleven 9s) and availability (99.99% for Standard) make it reliable, but misconfigured buckets remain one of the most common sources of data breaches in cloud environments. Public bucket exposure, missing encryption, and absent lifecycle policies lead to security incidents and uncontrolled cost growth.

Storage class selection directly impacts cost: Standard is $0.023/GB/month, Infrequent Access is $0.0125/GB, Glacier Instant Retrieval is $0.004/GB, Glacier Flexible Retrieval is $0.0036/GB, and Deep Archive is $0.00099/GB. For large data sets, proper lifecycle policies can reduce storage costs by 60-80%. However, retrieval fees and minimum storage duration charges apply to colder classes, so access patterns must be understood before transitioning.

Common Decisions (ADR Triggers)

  • Storage class strategy -- Standard for frequently accessed data, Intelligent-Tiering for unpredictable access patterns (automatically moves objects between tiers, $0.0025/1K objects monitoring fee), IA for infrequent but rapid access, Glacier for archival with retrieval needs (minutes to hours), Deep Archive for long-term retention rarely accessed (12-48 hour retrieval). Glacier Instant Retrieval bridges the gap with millisecond access at archive pricing.
  • Encryption approach: SSE-S3 vs SSE-KMS -- SSE-S3 is the default, free, and sufficient for most workloads. SSE-KMS enables key rotation control, key usage audit via CloudTrail, cross-account key sharing via key policies, and compliance requirements mandating customer-managed keys. SSE-KMS adds API call costs and is subject to KMS request rate limits (5,500-30,000 requests/second depending on region).
  • Replication strategy: CRR vs SRR vs no replication -- CRR for disaster recovery across regions (RPO depends on replication lag, typically seconds to minutes), regulatory compliance requiring data in multiple jurisdictions, or latency optimization for global users. SRR for aggregating logs from multiple accounts, maintaining a production replica in the same region, or compliance requiring data copies within the same jurisdiction.
  • Event-driven processing: S3 Events vs EventBridge -- S3 Event Notifications are simpler with direct Lambda/SQS/SNS integration. EventBridge provides content-based filtering on object metadata, routing to 20+ targets, archive/replay of events, and schema registry. Use EventBridge for complex event routing or when multiple consumers need different subsets of S3 events.
  • Object Lock mode: Governance vs Compliance -- Governance mode allows users with s3:BypassGovernanceRetention permission to override the lock; suitable for internal policies. Compliance mode is truly immutable -- no user, including root, can delete or modify the object until the retention period expires. Required for SEC 17a-4, FINRA, and similar regulations. Compliance mode decisions are irreversible.
  • Single bucket with prefixes vs multiple buckets -- Single bucket with prefixes is simpler to manage and allows shared policies. Multiple buckets provide stronger isolation, independent configurations (versioning, lifecycle, replication), and separate billing. Use multiple buckets for different environments (dev/staging/prod), different compliance requirements, or different teams with distinct access controls.

Reference Architectures

Static Website Hosting

S3 bucket with static website hosting enabled -> CloudFront distribution (OAC for private bucket access) -> Route 53 alias record. ACM certificate for HTTPS. CloudFront Functions for URL rewrites (index.html for SPA routing). Separate bucket for access logs. CI/CD pipeline deploys to S3 with CloudFront invalidation.

Data Lake Foundation

Raw zone (S3 Standard, versioning enabled) -> processing (Lambda/Glue triggered by S3 events) -> curated zone (S3 Standard, Parquet/ORC format) -> archive zone (Glacier). AWS Glue Data Catalog for schema. Athena for ad-hoc queries. Lake Formation for fine-grained access control. S3 Inventory for asset tracking. Lifecycle policies transition raw data to IA after 90 days.

Backup and Compliance Archive

S3 bucket with versioning, Object Lock (Compliance mode), SSE-KMS encryption. CRR to a separate account in another region for ransomware protection. Lifecycle policy: retain current versions for 7 years, transition non-current versions to Glacier after 30 days. S3 Inventory reports for compliance auditing. CloudTrail data events for access logging.

Multi-Tenant Application Storage

One S3 bucket per tenant with consistent naming convention. S3 Access Points per tenant with VPC-restricted policies. IAM session policies for row-level (prefix-level) isolation. Pre-signed URLs for temporary direct client access (upload/download). CloudFront with signed URLs/cookies for content delivery. Transfer Acceleration for global tenants uploading large files.


See Also

  • general/data.md -- General data architecture including storage tier selection
  • providers/aws/cloudfront-waf.md -- CloudFront with S3 origins for content delivery
  • providers/aws/lambda-serverless.md -- Lambda triggered by S3 event notifications
  • providers/aws/iam.md -- Bucket policies, IAM policies, and cross-account S3 access
  • providers/aws/storage.md -- AWS block, file, and hybrid storage services (EBS, EFS, FSx, Storage Gateway, DataSync)