AWS AI and ML Services¶

Scope¶

AWS managed AI and ML services. Covers Amazon Bedrock (foundation model API — Anthropic Claude, Amazon Titan, Meta Llama, Mistral, Cohere, Stability AI; knowledge bases for RAG, agents, guardrails, model evaluation, fine-tuning), Amazon SageMaker (training, hosting, pipelines, feature store, Canvas no-code ML, JumpStart model hub, HyperPod for distributed training), Amazon Q (enterprise AI assistant — Q Business for enterprise search, Q Developer for code assistance), Amazon Comprehend (NLP — entity extraction, sentiment, PII detection), Amazon Rekognition (image/video analysis), Amazon Textract (document extraction — forms, tables, queries), Amazon Transcribe and Polly (speech-to-text, text-to-speech), Amazon Kendra (enterprise search), Amazon Personalize (recommendations), AWS Trainium and Inferentia (custom AI accelerators). Does not cover GPU instance types — see patterns/ai-ml-infrastructure.md.

Checklist¶

Why This Matters¶

AWS has the broadest AI service portfolio among hyperscalers. The Bedrock vs SageMaker decision is the most consequential — Bedrock is the right choice for 80% of generative AI use cases (chatbots, document processing, code generation) because it eliminates infrastructure management, but SageMaker is required when custom training, fine-tuning with proprietary data, or hosting open-source models with specific hardware requirements is needed. Choosing SageMaker when Bedrock would suffice adds significant operational overhead.

Bedrock pricing can be surprising at scale — a customer-facing chatbot using Claude 3.5 Sonnet with 8K context averaging 1,000 requests/hour generates roughly $3,000-8,000/month in token costs alone. Provisioned throughput eliminates throttling risk and can reduce per-token cost for sustained workloads, but requires accurate capacity planning — overprovisioning wastes committed spend. Guardrails are not optional for production; without content filtering and PII redaction, regulated industries cannot deploy Bedrock.

Common Decisions (ADR Triggers)¶

Bedrock vs SageMaker -- API consumption vs custom training/hosting. Bedrock is appropriate when consuming foundation models without custom training, workloads are inference-only, and serverless operation is preferred. SageMaker is appropriate when custom model training or fine-tuning is required, open-source models need specific hosting configurations, or full control over model artifacts and infrastructure is needed.
Bedrock model selection -- Claude (strongest reasoning), Titan (AWS-native, cheapest embeddings), Llama (open-source, fine-tunable), Mistral (cost-effective for simpler tasks). Model choice affects cost, latency, quality, and vendor lock-in.
On-demand vs provisioned throughput -- On-demand for development, experimentation, and low-volume production. Provisioned throughput for sustained production workloads needing guaranteed capacity and predictable latency. Batch inference for offline processing where 50% cost savings outweigh async latency.
Bedrock Knowledge Bases vs custom RAG pipeline -- Knowledge Bases for managed simplicity (automatic chunking, embedding, vector store). Custom RAG for control over chunking strategies, embedding model selection, hybrid search, metadata filtering, and reranking. Custom pipelines add complexity but enable domain-specific optimization.
Textract + Comprehend vs Bedrock for document processing -- Textract for structured extraction (forms with known fields, tables, specific queries). Bedrock for unstructured understanding (summarization, classification, free-form Q&A over documents). Many production pipelines combine both — Textract extracts structured data, Bedrock handles reasoning over extracted content.
SageMaker real-time vs async vs batch inference -- Real-time endpoints for low-latency interactive workloads (sub-second response). Async inference with SQS queue for workloads tolerating minutes of latency (document processing, image generation). Batch transform for offline processing of large datasets.

Reference Links¶

Amazon Bedrock User Guide -- foundation model access, knowledge bases, agents, and guardrails
Amazon SageMaker Developer Guide -- custom model training, hosting, pipelines, and MLOps
Amazon Bedrock Pricing -- on-demand, provisioned throughput, and batch inference pricing