AI & Mobile

What is Apple's System Orchestrator in iOS 27? On-device vs cloud AI routing

Apple's new System Orchestrator decides whether each Siri request runs on-device, on Private Cloud Compute, or on Google's cloud. How iOS 27 AI routing works.

Read time: 13 min
Word count: 2.1K
Sections: 11
FAQs: 8

By Manu Shukla

Founder & Director June 25, 2026

On-device, private cloud, or Google's cloud: the orchestrator decides per request.

On this page · 11 sections

The four ingredients the orchestrator coordinates
How on-device versus cloud routing works
The model family underneath: AFM 3
Why route instead of using one model
Private Cloud Compute and the privacy model
The Google and Gemini relationship, accurately
What it means for developers
What it means for enterprises and India
FAQ
How eCorpIT can help
References

Summary. At WWDC 2026 on 8 June 2026, Apple introduced the System Orchestrator, the part of iOS 27 that decides, for every Siri AI request, whether the work runs on your device, on Apple's Private Cloud Compute, or on Google Cloud hardware. It coordinates Apple Intelligence across four ingredients: personal context, world knowledge, app actions, and on-screen awareness. Underneath it sits a new family of five Apple Foundation Models, from a 3-billion-parameter on-device model and a 20-billion-parameter on-device model that activates 1 to 4 billion parameters per request, up to a server model Apple runs on NVIDIA GPUs in Google Cloud for the heaviest reasoning. The routing has three tiers, and an ACM conference paper in June 2026 independently confirmed Apple's three core Private Cloud Compute privacy claims. Apple reportedly pays Google around $1 billion a year in a model deal announced on 12 January 2026, yet Apple says its models are its own, trained by Apple and refined using Google's Gemini, with no Gemini code inside. For developers and enterprise architects weighing Apple Intelligence privacy, the orchestrator is the piece that matters, because it is what decides where your data goes. This guide explains what it is and how the routing works.

The short definition: the System Orchestrator is a router and coordinator. When you ask Siri something, it works out what the request needs, pulls the right context, picks the right model, and calls the right app tool, then assembles the answer. Apple gave a concrete example at WWDC. Ask "what drinks would pair well with those two dishes," and the orchestrator already holds the context of which two dishes you meant from earlier in the conversation, but determines it needs world knowledge to answer, so it routes that part to the cloud. The point is that no single model does everything. The orchestrator is the traffic controller that makes a set of models and app tools behave like one assistant.

This matters most for the audience deciding whether to trust it: iOS developers building on the new frameworks, and enterprise architects and CTOs evaluating Apple Intelligence privacy before they allow it on managed devices. For all of them, the orchestrator's routing logic is the privacy boundary.

The four ingredients the orchestrator coordinates

Apple describes the orchestrator as coordinating Apple Intelligence across four ingredients. Each one is a different source of capability, and each has its own place where the work happens.

Ingredient	What it does	Mostly runs
Personal context	Finds relevant photos, notes, messages, files, and app data via the Spotlight semantic index	On device
World knowledge	Goes to the web for current information	Private Cloud Compute
App actions	Uses tools that apps expose through the App Toolbox and App Intents	On device, then the app
On-screen awareness	Understands what is on screen and acts in the moment	On device

The first, third, and fourth ingredients lean on technologies that work entirely on device: the Spotlight semantic index and the App Toolbox. That is deliberate. Personal context, the most sensitive ingredient, is assembled locally, so the assistant can reason about your photos and messages without those leaving the phone. Only when a request needs broad world knowledge or heavier reasoning does the orchestrator reach for the cloud.

How on-device versus cloud routing works

The orchestrator routes by how much capability a request needs, across three tiers. Most requests never leave the device. Harder ones step up to Apple's servers. Only the heaviest reach Google's hardware.

Tier	Handles	Where it runs
On device	Everyday tasks, personal context, on-screen actions	Apple Foundation Models on the iPhone
Apple Private Cloud Compute	Moderately complex requests, world knowledge	Apple's own models on Apple silicon servers
Google Cloud tier	The heaviest reasoning and agentic tool use	Apple's Cloud Pro model on NVIDIA GPUs in Google Cloud

The escalation is driven by the capacity of the on-device model. Apple's privacy architecture routes a request that exceeds what the roughly 3-billion-parameter on-device model can do up to Private Cloud Compute on Apple silicon first, and only sends the hardest reasoning to the top tier. As Apple chief executive Tim Cook framed the design, "We'll continue to run on the device and run in Private Cloud Compute, and maintain our industry-leading privacy standard." The practical takeaway is that the default is local, and each step away from the device is a deliberate escalation, not the norm.

The model family underneath: AFM 3

On 8 June 2026, Apple published the third generation of Apple Foundation Models, AFM 3, a family of five models that the orchestrator chooses between. Knowing the family helps you understand what runs where.

Model	Size and type	Where it runs
AFM 3 Core	3-billion-parameter dense, on-device	iPhone and other Apple devices
AFM 3 Core Advanced	20-billion-parameter sparse, activates 1 to 4 billion at a time	On device, most powerful local model
AFM 3 Cloud	Server workhorse for speed and efficiency	Apple silicon, Private Cloud Compute
ADM 3 Cloud (Image)	Image generation and editing	Apple silicon, Private Cloud Compute
AFM 3 Cloud Pro	Most capable, complex reasoning and agentic tool use	NVIDIA GPUs in Google Cloud, under Private Cloud Compute

Four of the five models, including the multimodal AFM 3 Core Advanced and the server workhorse AFM 3 Cloud, are purpose-built for Apple silicon and run either on the device or inside Apple's own data centres. Only AFM 3 Cloud Pro, the top of the stack, runs elsewhere: Apple worked with Google and NVIDIA to extend Private Cloud Compute onto NVIDIA GPUs in Google Cloud, while keeping the same privacy guarantees. So even the Google Cloud tier is, in Apple's framing, Private Cloud Compute on borrowed hardware, not a handoff to Google's own service.

Why route instead of using one model

A reasonable question is why Apple builds a router and five models rather than one large model that does everything. Three pressures explain it, and the orchestrator design optimises for all three. The first is latency. A 3-billion-parameter on-device model can answer in the time the cloud takes just to acknowledge a request, so keeping common tasks local makes the assistant feel instant. The second is privacy. Every request that stays on the device is one that never travels, which is why Apple assembles the most sensitive ingredient, personal context, locally through the Spotlight semantic index. The third is cost and reliability: on-device inference runs offline at no per-request cost, so features keep working on a plane or a weak network, and Apple does not pay for cloud compute to do work a phone can handle.

The orchestrator turns these pressures into one decision per request: use the smallest, closest model that can do the job, and escalate only when the task genuinely needs more. The larger models apply the same logic internally. AFM 3 Core Advanced holds 20 billion parameters but activates only 1 to 4 billion for a given request, spending compute in proportion to difficulty rather than running at full size every time. The result is a system that behaves like one assistant while quietly matching each request to the cheapest capable resource, on the device wherever possible.

Private Cloud Compute and the privacy model

Private Cloud Compute is the reason Apple can send some requests to servers without breaking its privacy promise. It extends the security model of an Apple device into the cloud. The compute is stateless and ephemeral: no user data is retained after a query resolves. Personally identifiable information is stripped before content reaches the inference layer, and queries are anonymized and tokenized so that neither Apple staff nor Google can link a request back to an individual user. When Private Cloud Compute handles a request, Apple states the personal data is not stored nor made accessible to Apple or anyone else.

Two facts make this more than a marketing claim. First, an ACM conference paper presented in June 2026 independently confirmed Apple's three core Private Cloud Compute privacy claims after analysis. Second, Apple's contract with Google bars Google from using Apple users' Siri queries to train future Gemini models. For an enterprise evaluating the system, those two points, external verification and a contractual training ban, are the load-bearing ones.

The Google and Gemini relationship, accurately

The reporting around WWDC 2026 ranged from "Siri now runs on Gemini" to "there is no Gemini inside Apple's models," so it is worth stating the relationship precisely. A joint Apple and Google statement on 12 January 2026 described the AFM family as built in collaboration with Google and based on its Gemini technology and cloud infrastructure. Apple's own position, set out in its machine learning research, is narrower: Apple's models are trained by Apple and refined using outputs from Gemini frontier models, but they do not use Gemini's infrastructure for inference, with the single exception of the top tier running on Google Cloud hardware under Private Cloud Compute.

In plain terms, Apple licensed Google's help to build better models, reportedly for around $1 billion a year, and uses Google Cloud's NVIDIA hardware for the heaviest tier, but Apple runs the inference, controls the privacy boundary, and says its models contain no Gemini code. Both the "powered by Gemini" headlines and the "no Gemini at all" rebuttals are half right. The accurate version is that Gemini shaped the models and Google supplies hardware for one tier, while Apple keeps control of where data is processed.

What it means for developers

For developers, the orchestrator is reachable through two frameworks. The Foundation Models framework is a native Swift API that gives your app direct access to the same on-device model that powers Apple Intelligence, and those on-device features work offline at no cost per request. In iOS 27 the framework also lets you call other language models that conform to a common Language Model protocol, including cloud models such as Claude and Gemini, so you are not locked to Apple's models alone.

The App Intents framework is how your app joins the orchestrator's world. By contributing entity schemas, you add your app's content to the Spotlight semantic index so the orchestrator can use it as personal context. By exposing intent schemas, you let the system invoke your app's actions through natural language. The pattern to internalise is that the orchestrator does not call your app directly; it draws on the context and actions you publish to the system. Build for those schemas and your app becomes something Siri AI can reason about and act on.

A concrete example makes the split clear. A recipe app that contributes its saved recipes as entity schemas lets the orchestrator surface "the two dishes I saved last night" as personal context, entirely on device. The same app exposing a "start cooking mode" action as an intent schema lets the user trigger it by voice through Siri AI, with the orchestrator matching the spoken request to the published intent. Neither requires the app to ship its own model or call the cloud; the app supplies content and actions, and the orchestrator supplies the intelligence. That division is the main design lesson of iOS 27 for developers. For the strategic view of building on platform AI, see our note on generative AI enterprise strategy for 2026.

What it means for enterprises and India

For an enterprise architect or CTO, the evaluation is more favourable than the "Siri runs on Google" headlines suggest, but it is not automatic. The default behaviour keeps most work on the device or on Apple's Private Cloud Compute, with no data retention and external verification of the privacy claims. The open questions are which categories of request escalate to the Google Cloud tier, what your mobile-device-management policy permits, and whether the on-device-first default is enough for your risk profile.

In the Indian context, this maps onto the Digital Personal Data Protection Act 2023 and the DPDP Rules 2025. An on-device-first design with stateless cloud compute aligns well with data-minimisation and purpose-limitation duties, because the most sensitive processing, personal context, happens locally. The diligence task for an Indian enterprise is to document the data flows for the cloud tiers, confirm that no personal data is retained, and decide, per use case, whether requests that could reach external hardware are acceptable for the data involved. The privacy story is strong by design, but a regulated enterprise still has to verify it against its own obligations rather than take the default on trust. A practical first step is to enumerate the Apple Intelligence features your users will actually use, mark which tier each one reaches, and gate the cloud tiers through mobile-device-management policy where the data warrants it.

FAQ

How eCorpIT can help

eCorpIT is a CMMI Level 5, senior-led technology organisation in Gurugram that builds iOS apps and advises enterprises on mobile AI. We integrate the Foundation Models and App Intents frameworks so your app contributes context and actions to the System Orchestrator, and we help architecture and security teams map Apple Intelligence data flows against DPDP and internal privacy requirements before rollout. If you are building on iOS 27 or evaluating Apple Intelligence for managed devices, talk to our team or read more about how we work.

References

Apple Newsroom, Apple introduces Siri AI, a profoundly more capable and personal assistant

Apple Newsroom, Apple Intelligence brings powerful AI capabilities into everyday experiences

Apple Machine Learning Research, introducing the third generation of Apple's Foundation Models

Apple Security, Private Cloud Compute: a new frontier for AI privacy in the cloud

Apple Developer, WWDC26 Apple Intelligence guide

9to5Mac, Apple's third-generation Foundation Models explained

9to5Mac, Apple confirms Gemini-powered Siri will use Private Cloud Compute

MacRumors, Apple reveals new AI architecture built around Google Gemini models

AppleInsider, Apple's new foundation models don't contain a drop of Gemini

Business Standard, Apple says Gemini-powered Siri will run on its private cloud: what it means

SiliconANGLE, Apple debuts Siri AI as a more personal assistant built on Gemini

Apple Developer, App Intents documentation

_Last updated: 25 June 2026._

Frequently asked

Quick answers.

01 What is Apple's System Orchestrator?

The System Orchestrator is the iOS 27 component that coordinates Apple Intelligence and Siri AI across four ingredients: personal context, world knowledge, app actions, and on-screen awareness. For each request, it decides what needs to happen and routes the work to the right model, on-device or in the cloud, and the right app tool.

02 How does Apple decide whether AI runs on-device or in the cloud?

It routes by complexity. Simple tasks stay on the device using Apple's own on-device models. Moderately complex requests go to Apple's Private Cloud Compute servers on Apple silicon. The heaviest reasoning routes to Apple's most capable cloud model running on NVIDIA GPUs in Google Cloud, still under Apple's privacy guarantees.

03 Does Siri now run on Google Gemini?

Not exactly. Apple's foundation models are trained by Apple and refined using outputs from Google's Gemini, but they are Apple's own models. Most requests use them on-device or on Apple silicon. Only the heaviest tier, Apple's Cloud Pro model, runs on Google Cloud hardware, and Apple says it contains no Gemini code.

04 What is Private Cloud Compute and is it private?

Private Cloud Compute is Apple's system for running AI on servers with device-level privacy. It uses stateless, ephemeral compute, so no user data is stored after a query resolves, and personal identifiers are stripped before content reaches the model. An ACM paper in June 2026 independently confirmed Apple's three core privacy claims.

05 Can Google see my Siri requests?

Apple says no. Queries are anonymized and tokenized so neither Apple staff nor Google can link a request to an individual user, and Apple's contract bars Google from using Siri queries to train future Gemini models. Even on the Google Cloud tier, processing stays inside Apple's Private Cloud Compute guarantees.

06 What can developers do with the Foundation Models framework?

The Foundation Models framework is a Swift API that gives apps direct access to the on-device model behind Apple Intelligence, working offline at no cost per request. In iOS 27 it also lets you call other language models, including cloud models like Claude and Gemini, through one Language Model protocol.

07 What does this mean for enterprises evaluating privacy?

For enterprises, the default is favourable: most Apple Intelligence work stays on-device or on Apple's Private Cloud Compute, with no data retention. The questions to ask are which requests reach the Google Cloud tier, what your mobile-device-management policy allows, and how this maps to your obligations under laws like India's DPDP Act.

08 Which iOS 27 AI features run offline?

Anything the on-device models handle. The 3-billion-parameter AFM 3 Core and the 20-billion-parameter AFM 3 Core Advanced run locally, so features built on them, including many writing, summarisation, and Siri tasks, work without a network connection. Requests that need world knowledge or the heaviest reasoning still go to the cloud.

About the author

Manu Shukla

Founder & Director

Founder of eCorpIT. Hands-on engineer leading senior-only delivery for AI apps, custom software, and cloud systems for global clients.

One engineering note a week. No fluff, no spam.

Senior-architect playbooks on AI agents, mobile apps, cloud, security, data, and marketing — delivered every Wednesday.

Past the reading

Read enough. Let's build something.

A senior architect responds in 24 working hours with scope, indicative cost, and a timeline. NDA before any technical conversation.

Talk to an architect Browse the 10 practices