Skip to content

Kyndryl Bridge

Scope

This file covers Kyndryl Bridge as the delivery, observability, and AIOps layer for Kyndryl managed-services engagements -- not the underlying infrastructure (KPC) Kyndryl may also provide. Topics: the Integrate/Observe/Orchestrate model, the Kyndryl Bridge service catalog (100+ technology integrations, 190+ services), knowledge-graph assembly from customer telemetry, AI-driven insight and recommendation workflow, agentic automation and certified playbooks, Intelligent Recovery Service (KIRS) integration, the boundary discussion between Bridge and customer-owned ITSM/observability/automation stacks, SLA reporting source-of-truth, and multi-MSP scenarios where Bridge is one pane among several. For Kyndryl infrastructure, see providers/kyndryl/private-cloud.md. For ITSM integration patterns generally, see general/itsm-integration.md. For managed-services scope boundaries, see general/managed-services-scoping.md.

Checklist

  • [Critical] Is the role of Kyndryl Bridge in the engagement explicitly defined -- is it the primary managed-services console (incident view, SLA reporting, recommendations) or one input to a customer-owned single pane (ServiceNow, JSM, Datadog)? Ambiguity here produces duplicate incident streams and unclear SLA ownership.
  • [Critical] Is the telemetry flow into Bridge documented per source system -- cloud providers (AWS, Azure, GCP), on-prem (vCenter, Nutanix Prism), observability (Datadog, Dynatrace, Splunk), ITSM (ServiceNow, JSM) -- with the integration mechanism (API pull, event push, agent, connector) and credential ownership specified?
  • [Critical] Is the ITSM integration pattern chosen intentionally -- Bridge as the primary ticketing system, Bridge forwarding to the customer's ITSM, or Bridge receiving from the customer's ITSM -- with bidirectional sync behavior, comment visibility, and ticket-state mapping documented?
  • [Critical] Is the SLA reporting source-of-truth defined -- Bridge calculates Kyndryl's contractual SLAs against its own measurements, the customer's ITSM calculates against its own, or one is authoritative with the other reconciling -- and are the measurement definitions (clock-start, pause conditions, stop) aligned between systems?
  • [Recommended] Is the knowledge-graph completeness validated at onboarding -- assets discovered, dependencies mapped, coverage gaps identified -- rather than assuming full discovery from the integrations alone? Incomplete graphs produce blind spots in recommendations and incident correlation.
  • [Recommended] Is the AI recommendation triage workflow defined -- who reviews capacity / cost / security / automation-candidate recommendations, on what cadence, and what is the acceptance criterion for acting -- rather than letting recommendations accumulate in the console unread?
  • [Recommended] Are Kyndryl Bridge certified playbooks inventoried and reviewed for scope -- what automation runs, what triggers it, what approval gates exist, what the customer's rollback authority is -- before they execute in the customer environment?
  • [Recommended] Is Bridge's observability scope coordinated with the customer's existing observability stack -- agreed which metrics/logs/traces flow to Bridge, which remain in Datadog/Dynatrace/Splunk only, and how alerting is de-duplicated so the same incident does not fire from two platforms?
  • [Recommended] Is Intelligent Recovery Service (KIRS) evaluated if cyber recovery or coordinated multi-system restoration is in scope -- KIRS is integrated with Bridge and provides recovery orchestration across the customer estate, distinct from per-system backup tooling?
  • [Recommended] Is multi-tenant / multi-MSP scope documented -- when Kyndryl manages part of the estate and a different MSP or customer team manages the rest, the Bridge knowledge graph and incident view reflect only the Kyndryl-managed scope, which is a common source of \"the console doesn't show X\" surprises?
  • [Optional] Is agentic automation scope (currently ~200M automations/month across ~8,000 certified playbooks at the platform level) understood in terms of what runs automatically vs what requires approval -- the default balance may not match the customer's operational-risk tolerance, particularly for production change?
  • [Optional] Is Agentic Service Management adoption assessed -- Kyndryl's 2026 framework for transitioning from traditional service ops to autonomous workflows -- including the maturity model, assessment, and blueprint phase before committing to agentic transformation?
  • [Optional] Is Bridge API access evaluated for the customer's own automation -- where the customer wants to read Bridge's state (incidents, recommendations, asset graph) into its own tooling rather than relying on the Bridge UI?

Why This Matters

Kyndryl Bridge is the customer-facing layer on most Kyndryl managed-services engagements, and the architectural decisions that matter are almost never about Bridge itself -- they are about the boundary between Bridge and the customer's own stack. The single most common mistake is treating Bridge as the default single pane of glass without asking what the customer's ITSM, observability, and automation teams are going to do with it. If the customer already has ServiceNow with an established SLA reporting discipline, Bridge as the primary ticketing system creates a parallel incident stream with competing SLA math. If the customer has Datadog as its observability standard, duplicating metric ingestion into Bridge wastes budget and confuses on-call. The decision is not "use Bridge" or "don't"; it is "what does Bridge own, what does the customer own, and where is the integration point."

The knowledge graph is Bridge's key differentiator against raw MSP dashboards -- it assembles an asset-and-dependency map from telemetry across the estate, and the AI recommendations are only as good as that graph is complete. Onboarding validation is where completeness is either confirmed or silently skipped. A graph that covers 60% of the estate produces recommendations that look authoritative but miss dependencies on the uncovered 40%, and the failure mode is invisible to anyone who does not know what the graph should look like. Validating coverage at onboarding -- explicit inventory comparison, dependency spot-checks -- is what turns the graph from a marketing claim into an operational asset.

SLA reporting source-of-truth is the ambiguity that produces the most post-contract disputes. Bridge measures its own view of MTTR, availability, incident volume; the customer's ITSM measures its view; neither agrees with the other exactly, because pause conditions, clock-start rules, and incident-linking definitions differ. Designating one system as authoritative for contractual SLA reporting -- and having the other reconcile to it -- is cheaper than arguing about whose number is right during a QBR. The ServiceNow hold_reason / JSM pause-condition discussion (see those files) applies here with a twist: Bridge's pause semantics need to align with whichever system is authoritative, not the other way around.

Agentic automation scope matters because the default -- ~200M automations/month across ~8,000 certified playbooks at the platform level -- is not what every customer wants running in its environment. Production-change automation in particular should have explicit approval gates, documented rollback authority, and a defined escalation path when the automation and the customer team disagree about the right action. \"Certified\" does not mean \"approved for this customer\"; it means \"Kyndryl has validated the playbook\". The gap between those two is where operational-risk decisions live.

Common Decisions (ADR Triggers)

  • Bridge as primary console vs one input to a customer-owned single pane -- Bridge as primary is the simplest path for customers without a pre-existing single-pane strategy, and gets the most value out of the AI recommendations and the knowledge graph. Bridge as input is appropriate when the customer already has ServiceNow + Splunk + Datadog as its operational standards; Bridge feeds those, does not replace them. The choice is organizational, not technical: if the customer's operations team will not adopt Bridge as their daily-driver console, forcing it produces shelfware. Decide this before telemetry flows, not after.
  • SLA reporting source-of-truth: Bridge vs customer ITSM -- Bridge is authoritative when Kyndryl's contract defines the SLA in Bridge's terms (Bridge's pause conditions, Bridge's incident definitions); this is cleaner for the Kyndryl team but requires customer acceptance that Bridge's math wins. Customer ITSM authoritative is cleaner for the customer but requires Bridge to reconcile and produces friction when Bridge recommendations are not traceable through the customer's ticket history. Decide per-contract; document the reconciliation path regardless.
  • Observability scope: full ingestion to Bridge vs Bridge as aggregator of existing tools -- Full ingestion gives Bridge a richer dataset for its AI and cleaner correlation across the estate but duplicates cost with the customer's existing observability spend. Aggregator mode lets Bridge read from Datadog / Dynatrace / Splunk via their APIs without re-ingesting raw data; the correlation is shallower but the cost is lower and the customer keeps its observability investment. Default to aggregator unless the customer is standardizing on Bridge for observability going forward.
  • Automation execution: Bridge playbooks vs customer-owned automation -- Bridge playbooks are the right choice for operations Kyndryl has clear accountability for (patching managed infrastructure, standard change templates, Kyndryl-owned monitoring remediation). Customer-owned automation is the right choice for operations the customer's team is accountable for (application deployments, business-process automation, custom runbooks). The gray area -- incident response where both teams have a role -- needs explicit approval gating rather than one side auto-executing and the other finding out afterward.
  • KIRS adoption for cyber recovery -- Kyndryl Intelligent Recovery Service is the right choice when coordinated cross-system recovery is a regulatory or business requirement (DORA, NIS2, financial-services resilience) and the customer does not already have a mature cyber-recovery orchestration platform. It is overkill when per-system backup tooling (Veeam, Rubrik, Cohesity) plus a customer-owned runbook is sufficient. Evaluate against the actual recovery scenarios the customer needs to demonstrate, not against the general \"cyber resilience\" marketing frame.

See Also

  • providers/kyndryl/private-cloud.md -- Kyndryl Private Cloud (KPC), the infrastructure layer Bridge observes when Kyndryl also provides infrastructure
  • general/managed-services-scoping.md -- managed-services scope boundaries; Bridge changes the boundary discussion when Kyndryl is the MSP
  • general/itsm-integration.md -- general ITSM integration patterns; applies to Bridge-to-customer-ITSM integration
  • providers/servicenow/itsm-operations.md -- ServiceNow SLA mechanics when ServiceNow is the source-of-truth against which Bridge reconciles
  • providers/atlassian/jsm-operations.md -- JSM SLA mechanics in the same reconciliation scenario
  • general/disaster-recovery.md -- recovery architecture context for evaluating KIRS adoption
  • compliance/dora.md -- DORA operational-resilience requirements that may drive KIRS evaluation
  • compliance/nis2.md -- NIS2 incident-reporting and resilience obligations that interact with Bridge's managed-services reporting