Why “great AI pilots” stall at scale

A product team ships a brilliant churn model in eight weeks. A second team launches a support copilot that agents love. Leadership celebrates—until the questions start piling up: Who owns model monitoring? Why do data definitions differ across business units? Why do two teams buy overlapping tools? Why did Legal only learn about a new vendor after production launch?

This is the moment organisations discover that AI success is not just use-case delivery. It’s a repeatable system for selecting, building, deploying, governing, and improving AI—across many teams and risk profiles. Right now, the pressure is higher than ever: faster model cycles, more third-party AI services, stricter regulatory expectations, and heightened scrutiny around privacy, bias, and security.

That’s what an AI operating model addresses. It’s the practical “how” that turns strategy into execution—without losing control.

The core vocabulary: operating model, patterns, and what “governance” really is

An operating model is the set of structures, processes, decision mechanisms, and ways of working that determine how value gets delivered repeatedly. For AI, this includes how your organisation manages the full lifecycle: ideation, data access, model development, deployment, monitoring, incident response, and retirement. It also includes how AI work integrates with product delivery, IT operations, security, and compliance.

An AI operating model pattern is a recurring organisational design used to run AI—most commonly differing by where AI capabilities sit (centralised vs distributed), how standards are enforced, and how teams coordinate. Patterns matter because the “best” design depends on your context: scale, talent distribution, regulatory risk, technical maturity, and how product teams are organised.

AI governance is often misunderstood as a “review committee.” In practice, governance is a system of guardrails: policies, standards, controls, monitoring, escalation paths, and accountability that keep AI aligned with business goals and within risk tolerance. Good governance enables speed by making decisions predictable and reusable, rather than forcing each team to reinvent risk management from scratch.

A useful analogy: think of AI like a growing city. You need builders (delivery teams), a planning department (standards and architecture), utilities (platform and data), and safety inspectors (risk and compliance). Without city planning, you still build—just not sustainably.

The four operating model patterns you’ll see in real organisations

Centralised “AI Center of Excellence” (CoE): fast coherence, slower throughput

In a centralised pattern, a single AI CoE (or a small set of central teams) owns most of the AI talent: data scientists, ML engineers, and often the platform. Business units submit requests, the CoE prioritises, then builds and deploys solutions. This pattern usually appears when AI capability is scarce, governance requirements are high, or leadership wants fast standardisation.

The big advantage is consistency. A central team can enforce shared definitions, reusable components, documentation discipline, model risk controls, and a common toolchain. That coherence is especially valuable when you’re introducing MLOps, data access patterns, and security controls for the first time. Centralisation also concentrates hard-to-hire expertise—like model risk specialists or prompt security practitioners—so they can be deployed where needed.

The trade-off is throughput and proximity to the business. A central queue creates delays, and AI teams can become disconnected from day-to-day workflows. Another common failure mode is “solution dumping”: the CoE delivers a model, but the receiving team lacks ownership for adoption, monitoring, and process change. Centralisation can also unintentionally encourage projects that look impressive technically but have weak product fit.

Misconception to avoid: centralised does not automatically mean “governed.” If the CoE is treated like a feature factory without clear risk gates, documentation requirements, and operational accountability, you end up with a powerful team producing opaque systems—and governance becomes performative rather than real.

Federated “hub-and-spoke”: shared standards with distributed delivery

Federated models keep a central hub that defines standards, platforms, and governance mechanisms, while spokes (embedded AI teams in business units or product domains) deliver use cases. The hub typically owns reference architectures, approved tooling, reusable feature stores and evaluation frameworks, and the model risk playbook. The spokes sit close to business processes, ship iteratively, and own adoption.

This pattern often balances speed and control better than pure centralisation. The hub reduces duplication and raises baseline quality, while product-aligned spokes ensure solutions fit real operations and user needs. It also scales talent: rather than a single bottleneck, multiple teams can deliver in parallel. Over time, the spokes become more capable, and the hub moves from “builder” to “enabler.”

However, federated models fail when the hub is either too weak or too authoritarian. If the hub can’t enforce minimum standards—like logging, evaluation, data lineage, or incident response—then “federation” becomes fragmentation. If the hub dictates every technical decision, the spokes become demotivated and slow. The sweet spot is clear, enforced guardrails with flexibility inside them.

Misconception to avoid: federation is not “everyone does their own thing.” Federation succeeds when shared services are genuinely useful—platforms that reduce effort, templates that speed compliance, and governance that is predictable rather than adversarial.

Embedded-in-product “you build it, you run it”: maximum ownership, high risk of inconsistency

In an embedded-in-product model, AI is treated like any other product capability: the product teams own delivery and operations end-to-end, including monitoring, retraining triggers, and user experience. This can be powerful for organisations with strong product engineering maturity and robust platform foundations. It tends to accelerate iteration because the team that feels user pain is the team that can fix it fastest.

The upside is deep integration with workflows. AI features rarely succeed as standalone artifacts; they succeed when embedded into decision points, tools, and incentives. When product teams own the full loop—data quality, model feedback, human escalation paths—they can tune the system continuously. This model also reduces handoffs, which are a major cause of “model in production but not used.”

The downside is predictable: duplication and uneven risk controls. Without strong shared standards and a platform that bakes in compliance and observability, teams will reinvent pipelines, choose incompatible tools, and measure success in inconsistent ways. In regulated environments, the risk is sharper: one team skipping documentation or evaluation can create enterprise-wide exposure.

Misconception to avoid: “ownership” doesn’t mean “no oversight.” Embedded models still need enterprise guardrails—especially around data access, third-party models, security testing, and model change control. The difference is that governance becomes part of the product lifecycle, not a detached review.

Separate “AI product/platform organisation”: treating AI as a reusable internal product

Some organisations create a distinct AI platform or AI product organisation that builds shared AI capabilities as internal products: model serving, monitoring, evaluation tooling, prompt management, data access layers, and compliance automation. Delivery teams then assemble use cases faster using these building blocks. This often emerges when AI adoption accelerates and the cost of duplication becomes obvious.

The key idea is leverage: build once, reuse many times. A well-run AI platform team reduces cognitive load for delivery teams by providing paved roads—secure defaults, standard telemetry, approved model registries, and repeatable deployment patterns. Governance becomes easier because controls can be embedded into the platform: for example, enforced logging, automated evaluation reports, and mandatory model cards before deployment.

The main pitfall is building a “platform in search of users.” If the platform team does not treat internal teams as customers, it can drift into gold-plating or become too rigid. Another risk is organisational siloing: platform teams can become disconnected from real product constraints, leading to tooling that is technically elegant but operationally inconvenient.

Misconception to avoid: “platform-first” isn’t the same as “value-first.” The platform should be driven by concrete adoption needs—reducing time-to-production, reducing incident rates, accelerating compliance, and improving maintainability—rather than building every possible feature up front.

Comparing patterns in a way leaders can actually decide

Different patterns optimise for different constraints. Most organisations end up with a hybrid, but clarity on the dominant pattern prevents confusion about ownership and escalation.

Decision dimension Centralised CoE Federated hub-and-spoke Embedded in product AI platform org (internal product)
Speed to scale delivery Medium: fast start, then bottlenecked by a queue and scarce specialists. Works best when demand is still limited. High: parallel delivery across domains once spokes are trained and standards are stable. Requires coordination discipline. High if teams are mature; low if they need to invent everything. Scales through autonomy, but can create chaos without guardrails. Medium–High once adopted: accelerates delivery via reuse, but needs time to become useful and trusted.
Consistency & standards High: one team can enforce tooling, documentation, and measurement. Risk of “one-size-fits-all” standards. High if enforced: hub sets non-negotiables while spokes adapt locally. Without enforcement, consistency decays quickly. Variable: depends on platform maturity and governance integration. Often uneven across teams. High if platform is the paved road: standards can be built into services and workflows instead of policy documents.
Risk control & compliance High potential: easier to audit a central team, but risks pile up if it becomes overloaded. Governance may become a gate that slows work. High with good design: shared model risk controls + local accountability. Works well when risk tiers are clear. Medium–High if controls are embedded; low if each team implements controls ad hoc. Requires strong security and compliance enablement. High when controls are automated: logging, access control, evaluation, and change tracking can be standardized and enforced.
Fit to business workflows Medium: central team may lack deep context; adoption can lag. Needs strong product partnership. High: spokes are close to operations and users. Still benefits from hub expertise for tricky edge cases. Very high: teams own end-to-end outcomes and iterate in the workflow. Medium–High: platform enables others; workflow fit depends on delivery teams using platform flexibly.
Typical failure mode Becomes a ticket queue; delivers models without adoption ownership; “AI theatre” dashboards without impact. Hub too weak (fragmentation) or too controlling (slow, demotivating). Confusing lines between hub and spokes. Tool sprawl, duplicated pipelines, inconsistent evaluation; incidents handled differently team-by-team. Over-engineered platform; low adoption; “platform says no” culture rather than enablement.

After you choose a pattern, you still need the mechanisms that make it real: lifecycle processes, governance controls, and metrics that steer behaviour.

How governance and execution fit together (without turning into bureaucracy)

Treat AI governance as a set of risk-based controls that scale with impact. Not every use case needs the same maturity: a marketing copy generator and a credit decision model cannot share a single approval path. The most resilient operating models define tiers—lightweight checks for low-risk use cases, deeper validation for high-risk ones—and make those checks repeatable.

A practical way to think about it is through lifecycle “moments” that deserve explicit ownership and evidence. These moments typically include: approving data access, validating training data quality, evaluating performance and safety, approving deployment, monitoring drift, and managing incidents. Governance becomes operational when it is tied to artifacts like evaluation reports, data lineage, model cards, and change logs—rather than verbal assurances.

Common pitfalls show up when governance is bolted on late. If teams must rewrite documentation at the end, they either delay release or generate low-quality paperwork. The better approach is “governance by design”: templates, automated evidence capture, and platform defaults that produce required artifacts as a byproduct of normal engineering.

[[flowchart-placeholder]]

A final misconception to address: governance is not the enemy of innovation. Poorly designed governance is. Well-designed governance reduces uncertainty, accelerates delivery through reusable guardrails, and prevents the kind of incidents that freeze investment and damage trust.

Two applied examples: what pattern choice looks like in practice

Example 1: Retail bank launching GenAI for customer support (high regulatory and reputational risk)

A retail bank wants a GenAI assistant for contact-center agents: summarising calls, drafting responses, and recommending next-best actions. The bank has strong compliance requirements, strict data handling rules, and low tolerance for hallucinated financial advice. A purely embedded approach would let each product team move fast, but the risk of inconsistent controls is significant.

A common operating model fit is federated hub-and-spoke with strong central governance enablement. Step-by-step, it looks like this. First, the hub defines non-negotiables: approved model providers, data access patterns, redaction standards, logging requirements, and a standard evaluation suite (accuracy, groundedness, toxicity, and privacy leakage checks). Second, spokes in the contact-center domain build the agent experience and workflow integration, because they understand how agents actually work and what “good output” means in context.

Third, governance is implemented as lifecycle gates tied to evidence. Before production, the team produces a documented evaluation report using the hub’s framework, including known failure modes and human fallback behaviours. Monitoring is then standardised: drift indicators, flagged-response review queues, and incident escalation to a defined owner. The benefit is speed with control: spokes ship improvements weekly while risk controls remain consistent.

Limitations remain. This pattern can still slow down if the hub becomes a bottleneck for approvals or if spokes treat governance as “someone else’s problem.” The bank mitigates this by making the hub’s services self-serve, automating evidence capture, and holding the spoke accountable for adoption and operational outcomes.

Example 2: Manufacturer optimizing predictive maintenance across plants (distributed operations, mixed maturity)

A manufacturer wants predictive maintenance models across 40 plants. The use case is similar everywhere—forecasting failures and scheduling maintenance—but data quality varies widely by site, and local engineering teams have different levels of analytics capability. A central team building every plant model would be overwhelmed, but fully autonomous local teams would likely create incompatible pipelines and inconsistent KPIs.

A strong fit here is an AI platform organisation paired with a federated delivery model. Step-by-step: the platform team builds a standard data ingestion and feature framework, with clear data contracts for sensor streams and maintenance logs. They provide model templates, model serving, and monitoring so local teams don’t reinvent the basics. This is the paved road: if a plant uses the platform, it gets standardized telemetry, versioning, and incident workflows by default.

Then, a small set of regional “spoke” teams adapts models to local conditions—different machines, different failure patterns—while using the same evaluation and deployment backbone. Governance becomes lighter than the bank example but still structured: plants must meet minimum data quality thresholds before deployment, and monitoring metrics are consistent so leadership can compare performance across sites.

The impact is leverage and comparability. The organisation reduces duplicated engineering, improves time-to-deployment, and can roll up risk signals across plants. The limitation is adoption: if the platform is hard to use or doesn’t fit local constraints (connectivity, legacy systems), teams will bypass it. Successful implementations treat platform usability and local onboarding as first-class product work, not an afterthought.

Closing: choosing a pattern is choosing trade-offs on purpose

AI operating model patterns are not maturity badges; they’re design choices shaped by your constraints. Centralised models buy coherence early, federated models scale delivery with shared guardrails, embedded models maximise ownership and workflow fit, and platform-led models create reuse and enforceable standards through “paved roads.”

The practical test is simple: can your organisation deliver more AI use cases next quarter without increasing risk, duplication, and operational fragility? The right operating model makes that possible by setting clear structures and repeatable mechanisms—not just aspirations.

Next, we'll build on this by exploring Roles & Decision Rights [35 minutes].

Last modified: Friday, 6 March 2026, 6:05 PM