Azure AI and ML Services¶

Scope¶

Azure managed AI and ML services: Azure OpenAI Service (GPT-4o, GPT-4, GPT-4 Turbo, o1/o3-mini reasoning models, DALL-E 3, Whisper, text-embedding-ada-002/text-embedding-3; provisioned throughput units (PTU), content filtering, fine-tuning, Assistants API), Azure AI Services (formerly Cognitive Services — Vision, Speech, Language, Document Intelligence, Custom Vision, Translator), Azure AI Search (vector search, hybrid search, semantic ranking, integrated vectorization, skillsets for document cracking), Azure Machine Learning (training compute, managed endpoints, pipelines, prompt flow, responsible AI dashboard), Azure AI Studio (unified AI development portal, playground, deployments, evaluations), Azure AI Document Intelligence (form extraction, layout analysis, custom models, prebuilt models for invoices/receipts/IDs), Copilot Studio (low-code AI agents and chatbots). Does not cover GPU VM sizes — see patterns/ai-ml-infrastructure.md.

Checklist¶

Why This Matters¶

Azure OpenAI Service is the primary enterprise entry point for generative AI due to Microsoft's enterprise relationships, Entra ID integration, and data residency guarantees. The PTU vs Standard deployment decision is the most consequential cost choice — Standard pricing is simple (pay-per-token) but subject to throttling under load, while PTU guarantees latency and throughput but requires capacity planning and minimum monthly commitment. A production chatbot processing 500K tokens/hour on GPT-4o costs approximately $750/month on Standard or requires 6+ PTUs (~$6,000/month) for guaranteed performance. Overprovisioning PTUs is the most common cost mistake.

Azure AI Search has become the default RAG backend in the Azure ecosystem due to integrated vectorization — it can automatically chunk documents, generate embeddings via Azure OpenAI, and serve hybrid search queries without a separate vector database. However, index storage for vector embeddings is substantially larger than keyword-only indexes (3-10x), making tier selection critical. Content filtering on Azure OpenAI is mandatory and cannot be fully disabled — organizations that need unfiltered model access for specific use cases (security research, medical content) must apply for modified content filtering through Microsoft.

Common Decisions (ADR Triggers)¶

Standard vs PTU vs Global Standard deployment -- for Azure OpenAI production workloads
Azure AI Search vs dedicated vector database -- Qdrant, Weaviate, or pgvector for RAG
Azure OpenAI vs Azure ML managed endpoints -- managed API vs custom model hosting
Azure AI Document Intelligence vs Azure OpenAI -- structured extraction vs LLM understanding for document processing
Content filtering configuration -- default filters vs modified filtering (requires Microsoft approval)
Single-region vs multi-region Azure OpenAI deployment -- for availability and latency
Copilot Studio vs custom development -- for conversational AI scenarios

Reference Links¶

Azure OpenAI Service documentation -- deployment models, quotas, content filtering, fine-tuning, and Assistants API
Azure AI Search documentation -- vector search, hybrid search, semantic ranking, and integrated vectorization
Azure Machine Learning documentation -- training compute, managed endpoints, pipelines, and responsible AI
Azure AI Studio documentation -- unified AI development portal, model catalog, and prompt flow
Azure AI Document Intelligence documentation -- form extraction, layout analysis, and prebuilt models
Microsoft Copilot Studio documentation -- low-code AI agents and chatbot development