Kyndryl Bridge¶

Scope¶

This file covers Kyndryl Bridge as the delivery, observability, and AIOps layer for Kyndryl managed-services engagements -- not the underlying infrastructure (KPC) Kyndryl may also provide. Topics: the Integrate/Observe/Orchestrate model, the Kyndryl Bridge service catalog (100+ technology integrations, 190+ services), knowledge-graph assembly from customer telemetry, AI-driven insight and recommendation workflow, agentic automation and certified playbooks, Intelligent Recovery Service (KIRS) integration, the boundary discussion between Bridge and customer-owned ITSM/observability/automation stacks, SLA reporting source-of-truth, and multi-MSP scenarios where Bridge is one pane among several. For Kyndryl infrastructure, see providers/kyndryl/private-cloud.md. For ITSM integration patterns generally, see general/itsm-integration.md. For managed-services scope boundaries, see general/managed-services-scoping.md.

Checklist¶

Why This Matters¶

Kyndryl Bridge is the customer-facing layer on most Kyndryl managed-services engagements, and the architectural decisions that matter are almost never about Bridge itself -- they are about the boundary between Bridge and the customer's own stack. The single most common mistake is treating Bridge as the default single pane of glass without asking what the customer's ITSM, observability, and automation teams are going to do with it. If the customer already has ServiceNow with an established SLA reporting discipline, Bridge as the primary ticketing system creates a parallel incident stream with competing SLA math. If the customer has Datadog as its observability standard, duplicating metric ingestion into Bridge wastes budget and confuses on-call. The decision is not "use Bridge" or "don't"; it is "what does Bridge own, what does the customer own, and where is the integration point."

The knowledge graph is Bridge's key differentiator against raw MSP dashboards -- it assembles an asset-and-dependency map from telemetry across the estate, and the AI recommendations are only as good as that graph is complete. Onboarding validation is where completeness is either confirmed or silently skipped. A graph that covers 60% of the estate produces recommendations that look authoritative but miss dependencies on the uncovered 40%, and the failure mode is invisible to anyone who does not know what the graph should look like. Validating coverage at onboarding -- explicit inventory comparison, dependency spot-checks -- is what turns the graph from a marketing claim into an operational asset.

SLA reporting source-of-truth is the ambiguity that produces the most post-contract disputes. Bridge measures its own view of MTTR, availability, incident volume; the customer's ITSM measures its view; neither agrees with the other exactly, because pause conditions, clock-start rules, and incident-linking definitions differ. Designating one system as authoritative for contractual SLA reporting -- and having the other reconcile to it -- is cheaper than arguing about whose number is right during a QBR. The ServiceNow hold_reason / JSM pause-condition discussion (see those files) applies here with a twist: Bridge's pause semantics need to align with whichever system is authoritative, not the other way around.

Agentic automation scope matters because the default -- ~200M automations/month across ~8,000 certified playbooks at the platform level -- is not what every customer wants running in its environment. Production-change automation in particular should have explicit approval gates, documented rollback authority, and a defined escalation path when the automation and the customer team disagree about the right action. \"Certified\" does not mean \"approved for this customer\"; it means \"Kyndryl has validated the playbook\". The gap between those two is where operational-risk decisions live.

Common Decisions (ADR Triggers)¶

Bridge as primary console vs one input to a customer-owned single pane -- Bridge as primary is the simplest path for customers without a pre-existing single-pane strategy, and gets the most value out of the AI recommendations and the knowledge graph. Bridge as input is appropriate when the customer already has ServiceNow + Splunk + Datadog as its operational standards; Bridge feeds those, does not replace them. The choice is organizational, not technical: if the customer's operations team will not adopt Bridge as their daily-driver console, forcing it produces shelfware. Decide this before telemetry flows, not after.
SLA reporting source-of-truth: Bridge vs customer ITSM -- Bridge is authoritative when Kyndryl's contract defines the SLA in Bridge's terms (Bridge's pause conditions, Bridge's incident definitions); this is cleaner for the Kyndryl team but requires customer acceptance that Bridge's math wins. Customer ITSM authoritative is cleaner for the customer but requires Bridge to reconcile and produces friction when Bridge recommendations are not traceable through the customer's ticket history. Decide per-contract; document the reconciliation path regardless.
Observability scope: full ingestion to Bridge vs Bridge as aggregator of existing tools -- Full ingestion gives Bridge a richer dataset for its AI and cleaner correlation across the estate but duplicates cost with the customer's existing observability spend. Aggregator mode lets Bridge read from Datadog / Dynatrace / Splunk via their APIs without re-ingesting raw data; the correlation is shallower but the cost is lower and the customer keeps its observability investment. Default to aggregator unless the customer is standardizing on Bridge for observability going forward.
Automation execution: Bridge playbooks vs customer-owned automation -- Bridge playbooks are the right choice for operations Kyndryl has clear accountability for (patching managed infrastructure, standard change templates, Kyndryl-owned monitoring remediation). Customer-owned automation is the right choice for operations the customer's team is accountable for (application deployments, business-process automation, custom runbooks). The gray area -- incident response where both teams have a role -- needs explicit approval gating rather than one side auto-executing and the other finding out afterward.
KIRS adoption for cyber recovery -- Kyndryl Intelligent Recovery Service is the right choice when coordinated cross-system recovery is a regulatory or business requirement (DORA, NIS2, financial-services resilience) and the customer does not already have a mature cyber-recovery orchestration platform. It is overkill when per-system backup tooling (Veeam, Rubrik, Cohesity) plus a customer-owned runbook is sufficient. Evaluate against the actual recovery scenarios the customer needs to demonstrate, not against the general \"cyber resilience\" marketing frame.

Reference Links¶

Kyndryl Bridge platform overview -- Integrate / Observe / Orchestrate model, service catalog, AI insight volumes, customer outcomes
Kyndryl Bridge service catalog -- the 190+ services available through Bridge
Kyndryl Bridge documentation -- getting started, onboarding, technical architecture
Kyndryl Agentic Service Management -- 2026 maturity model and blueprint for agentic transformation
Kyndryl Intelligent Recovery Service -- KIRS integrated with Bridge for coordinated recovery
Kyndryl Bridge launch announcement (2022) -- original positioning and platform goals