SaaS Multi-Tenant Architecture¶
Scope¶
Covers multi-tenant SaaS architecture patterns including tenant isolation models (silo, pool, bridge), data isolation strategies, noisy neighbor prevention, tenant-aware routing, onboarding automation, billing and metering, and SLA tiering. Applicable when building a SaaS product that serves multiple customers from shared infrastructure.
Overview¶
Multi-tenant SaaS architectures serve multiple customers (tenants) from shared infrastructure. The central challenge is balancing isolation (security, performance, compliance) against efficiency (cost, operational simplicity). Every architectural decision must consider the tenant dimension.
Checklist¶
- [Critical] What is the tenant isolation model? (silo, pool, or bridge — decision depends on compliance, cost, and customer requirements)
- [Critical] How is per-tenant data isolated? (database-per-tenant, schema-per-tenant, row-level security, or hybrid)
- [Critical] Is noisy neighbor prevention implemented? (rate limiting, resource quotas, throttling per tenant)
- [Critical] How is tenant-aware routing handled? (subdomain, header, JWT claim, path prefix)
- [Recommended] Is tenant onboarding automated? (provisioning, configuration, data seeding, DNS)
- [Critical] How is billing and metering implemented? (usage tracking, plan enforcement, overage handling)
- [Recommended] Are SLA tiers supported? (different isolation, performance, or availability per pricing tier)
- [Recommended] Is data residency configurable per tenant? (regional deployment, data sovereignty compliance)
- [Critical] How is tenant admin separated from platform admin? (RBAC scoping, tenant-scoped operations)
- [Recommended] Is tenant-level monitoring and logging in place? (metrics per tenant, cost attribution)
- [Recommended] How are tenant-specific customizations handled? (feature flags, configuration, not code branches)
- [Critical] Is cross-tenant data leakage prevented at every layer? (API, database, cache, logs, error messages)
- [Critical] Are regulatory isolation requirements met per tenant? (HIPAA may require silo model for PHI, PCI DSS requires network isolation for cardholder data, GDPR requires data residency controls for EU personal data — the isolation model must match the most restrictive tenant's compliance requirements, or the bridge model must support per-tenant isolation upgrades)
- [Recommended] Is tenant offboarding and data deletion automated? (GDPR right to erasure, data retention policies)
Why This Matters¶
Multi-tenancy failures are catastrophic: a data leak between tenants is a company-ending event. A noisy neighbor incident degrades service for all customers simultaneously. Without proper metering, pricing cannot cover costs. Without automated onboarding, growth requires linear operations staff.
The isolation model is the most consequential architectural decision in SaaS — it affects every layer of the stack and is extremely expensive to change after launch. Getting this wrong means either over-spending on infrastructure (full silo for all tenants) or under-delivering on security (full pool without adequate isolation controls).
Tenant Isolation Models¶
Silo Model (Separate Infrastructure Per Tenant)¶
Tenant A Tenant B
┌─────────────────┐ ┌─────────────────┐
│ Load Balancer │ │ Load Balancer │
│ App Servers │ │ App Servers │
│ Database │ │ Database │
│ Cache │ │ Cache │
│ (Own VPC/VNet) │ │ (Own VPC/VNet) │
└─────────────────┘ └─────────────────┘
Pros: Strongest isolation, simplest compliance story, per-tenant scaling, per-tenant customization Cons: Highest cost, operational complexity scales linearly, onboarding is slow without automation Best for: Regulated industries (healthcare, finance), enterprise customers demanding dedicated infrastructure, tenants with vastly different scale requirements
Pool Model (Shared Infrastructure, Logical Isolation)¶
Shared Infrastructure
┌──────────────────────────────────┐
│ Load Balancer │
│ App Servers (tenant context │
│ in every request) │
│ Shared Database (RLS or │
│ schema-per-tenant) │
│ Shared Cache (tenant-prefixed │
│ keys) │
│ (Single VPC/VNet) │
└──────────────────────────────────┘
▲ ▲ ▲
Tenant A Tenant B Tenant C
Pros: Lowest cost per tenant, simplest operations, fastest onboarding, best resource utilization Cons: Noisy neighbor risk, compliance complexity, blast radius affects all tenants, harder per-tenant customization Best for: High-volume SaaS with many small tenants, B2C or SMB-focused products, self-serve onboarding
Bridge Model (Hybrid)¶
Shared Control Plane
┌──────────────────────────────────┐
│ Routing / API Gateway │
│ Authentication / Tenant Registry │
│ Billing / Metering │
│ Admin Console │
└──────────┬───────────────────────┘
│
┌──────┴──────┐
▼ ▼
Pool Tier Silo Tier
┌──────────┐ ┌──────────┐
│ Shared │ │ Dedicated │
│ compute + │ │ compute + │
│ shared DB │ │ own DB │
│ │ │ │
│ Free/SMB │ │ Enterprise│
│ tenants │ │ tenants │
└──────────┘ └──────────┘
Pros: Cost-efficient for small tenants, premium offering for enterprise, matches pricing tiers to isolation Cons: Two operational models to maintain, complex routing logic, harder to test Best for: SaaS products spanning SMB to enterprise, products with tiered pricing
Data Isolation Strategies¶
| Strategy | Isolation Level | Cost | Complexity | Compliance |
|---|---|---|---|---|
| Database-per-tenant | Highest | Highest | Medium (connection management) | Easiest to audit |
| Schema-per-tenant | High | Medium | Medium (migration coordination) | Good — clear boundaries |
| Row-level security (RLS) | Medium | Lowest | High (must be bulletproof) | Requires careful audit |
| Hybrid (RLS for pool, DB-per for silo) | Variable | Optimized | Highest | Matches tier requirements |
Database-Per-Tenant¶
- Each tenant gets an isolated database instance or cluster
- Connection routing maps
tenant_id→ database endpoint - Schema migrations must run across all tenant databases (automation critical)
- Consider connection pooling — 1,000 tenants × 10 connections = 10,000 connections to manage
- Provider options: RDS per tenant, Aurora with separate clusters, Azure SQL per DB
Schema-Per-Tenant¶
- Single database server, separate schema (namespace) per tenant
- Cheaper than database-per-tenant, stronger isolation than RLS
- Schema migration runs per-schema (can be parallelized)
- Works well in PostgreSQL (
SET search_path), SQL Server (schemas), MySQL (databases as schemas) - Risk: Operational limit on number of schemas per server (typically hundreds, not thousands)
Row-Level Security (RLS)¶
- Single database, single schema,
tenant_idcolumn on every table - Database enforces row filtering via RLS policies (PostgreSQL RLS, SQL Server RLS)
- Application-level filtering is not sufficient — one missed WHERE clause leaks data
- Always use database-native RLS, not just application ORM filters
- Test extensively: direct SQL access, reporting queries, bulk operations, joins
-- PostgreSQL RLS example
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.current_tenant')::uuid);
Noisy Neighbor Prevention¶
| Layer | Technique | Implementation |
|---|---|---|
| API | Per-tenant rate limiting | API Gateway throttling per API key/tenant ID |
| Compute | Resource quotas | K8s ResourceQuotas per namespace, container limits |
| Database | Connection limits per tenant | PgBouncer per-pool limits, max_connections per schema |
| Cache | Key-space quotas | Eviction policies, memory limits per tenant prefix |
| Storage | Quota enforcement | S3 bucket policies, Azure Blob quotas per container |
| Queue | Per-tenant queue or rate limiting | Separate queues per tenant, or message rate caps |
Rate Limiting Strategy¶
Tier 1 (Free): 100 requests/minute, 1,000/hour, 10,000/day
Tier 2 (Pro): 1,000 requests/minute, 10,000/hour, 100,000/day
Tier 3 (Enterprise): Custom, negotiated per contract
Tenant-Aware Routing¶
| Method | Pattern | Pros | Cons |
|---|---|---|---|
| Subdomain | tenant-a.app.com |
Clean separation, easy SSL with wildcard | DNS management, wildcard certs |
| Path prefix | app.com/tenant-a/... |
Simple routing, single domain | Pollutes URL namespace |
| Header | X-Tenant-ID: tenant-a |
Clean URLs, flexible | Requires client cooperation |
| JWT claim | {"tenant_id": "tenant-a"} |
Secure, no URL/header manipulation | Requires auth on every request |
Recommendation: Use subdomain for customer-facing access and JWT claim for API access. The tenant context should be established once at the edge (API gateway) and propagated through all internal services.
Tenant Onboarding Automation¶
Onboarding Steps (Automate All)¶
- Create tenant record in tenant registry (ID, name, plan, region)
- Provision data store (create database/schema/RLS policy)
- Configure DNS (subdomain routing if applicable)
- Seed initial data (default settings, sample data, admin user)
- Set quotas and limits based on plan tier
- Enable monitoring (tenant-scoped dashboards, alerts)
- Send welcome communications (admin credentials, getting started guide)
Target: Self-serve onboarding in under 60 seconds for pool-tier tenants.¶
Billing and Metering¶
Metering Architecture¶
Application → Usage Events → Event Stream → Metering Service → Billing System
(Kafka/SQS) (aggregate, (Stripe,
deduplicate, custom)
rate)
Common Metering Dimensions¶
| Dimension | Examples |
|---|---|
| API calls | Requests per month, by endpoint |
| Data storage | GB stored, GB transferred |
| Compute | Execution time, vCPU-seconds |
| Users | Active users, seats provisioned |
| Features | Premium feature usage counts |
Billing Models¶
- Flat-rate: Fixed price per tier per month (simple, predictable)
- Usage-based: Pay per unit consumed (fair, but unpredictable for customers)
- Hybrid: Base fee + usage overage (most common in practice)
- Per-seat: Price per user (common for B2B, easy to understand)
Data Residency Per Tenant¶
For tenants with data sovereignty requirements (GDPR, data localization laws):
- Regional deployments — Deploy application stack in multiple regions
- Tenant-to-region mapping — Route tenant traffic to the region where their data resides
- Cross-region replication — Only for operational data, never replicate tenant data to non-approved regions
- Regional database isolation — Tenant data stays in-region; global metadata can be centralized
Tenant Admin vs Platform Admin¶
| Capability | Tenant Admin | Platform Admin |
|---|---|---|
| User management | Own tenant users only | All users, all tenants |
| Configuration | Own tenant settings | Global settings, tenant overrides |
| Data access | Own tenant data only | Cross-tenant (with audit logging) |
| Billing | View own invoices, update payment | Set pricing, view all billing |
| Support | Submit tickets | Manage all tickets, impersonate tenants |
Critical rule: Platform admin cross-tenant access must be audited, time-limited, and require justification (break-glass access pattern).
SLA Tiers¶
| Tier | Availability | Support | Isolation | Customization |
|---|---|---|---|---|
| Free | Best effort | Community/docs | Pool (shared everything) | None |
| Pro | 99.9% | Email, 24h response | Pool with guaranteed quotas | Feature flags |
| Business | 99.95% | Priority, 4h response | Bridge (dedicated compute) | Configuration |
| Enterprise | 99.99% | Dedicated CSM, 1h response | Silo (dedicated everything) | Custom integrations |
Common Decisions (ADR Triggers)¶
- Isolation model — silo vs pool vs bridge; the foundational decision
- Data isolation strategy — database-per-tenant vs schema-per-tenant vs RLS
- Routing mechanism — subdomain vs path vs header vs JWT
- Billing model — flat-rate vs usage-based vs hybrid vs per-seat
- Noisy neighbor strategy — rate limiting approach, quota enforcement
- Data residency — single region vs multi-region, tenant-to-region mapping
- Onboarding model — self-serve vs sales-assisted vs white-glove
- Customization approach — feature flags vs configuration vs custom code (avoid custom code)
- SLA tier mapping — which tiers to offer, what isolation level per tier
See Also¶
general/security.md— Security controls applicable to multi-tenant systemsgeneral/data.md— Data architecture and storage patternsgeneral/identity.md— Authentication and authorization (tenant-scoped RBAC)general/api-design.md— API design including tenant context propagationpatterns/microservices.md— Service decomposition with tenant awareness