On this page · 12 sections
- What Apple actually announced
- Why Apple chose Google, and what it passed over
- The five AFM 3 models, plus Gemini
- AFM Cloud Pro versus Gemini frontier, head to head
- How Apple kept the privacy promise on someone else's cloud
- The orchestration and developer layer
- The privacy promise, questioned
- What this means for your enterprise AI stack
- India-specific considerations
- FAQ
- How eCorpIT can help
- References
Summary. At WWDC on 9 June 2026, Apple announced its third generation of Apple Foundation Models, a family of five, and confirmed that its most capable cloud model, AFM 3 Cloud Pro, now runs on Nvidia Blackwell GPUs hosted inside Google Cloud while keeping Private Cloud Compute privacy guarantees. The model is custom-built for Apple silicon, trained on Apple data, and refined using outputs from Google's Gemini frontier models, a distillation relationship rather than wholesale adoption. Apple's software chief Craig Federighi was blunt about how much Gemini ships in the product: "The amount of the Google Assistant we use is none." Separately, Bloomberg reported Apple pays Google around $1 billion a year for a custom 1.2-trillion-parameter Gemini model to help the new Siri. For enterprise architects, the interesting part is not the gossip; it is the architecture. Apple built a privacy-preserving, multi-cloud, multi-model stack that runs its own model on a rival's cloud and rival's chips. That is a template worth studying. This is what AFM Cloud Pro versus Gemini frontier means for your stack.
Apple spent a decade telling the world it would not send your data to someone else's servers. In June 2026 it extended Private Cloud Compute to Google Cloud and Nvidia hardware for the first time, and still claimed the same privacy bar. How it pulled that off, and the model choices behind it, is the most instructive enterprise AI case study of the year.
What Apple actually announced
Two things, on two dates, that are easy to conflate. In January 2026, CNBC reported Apple would license a custom 1.2-trillion-parameter Gemini model to help power a revamped Siri, with Bloomberg's Mark Gurman putting the price near $1 billion a year. Then at WWDC in June, Apple detailed its own AFM 3 models and the infrastructure behind them, and went out of its way to minimise how much Gemini reaches users.
The reconciliation: Google's Gemini plays two roles. It refines Apple's models through distillation, and it provides frontier capability that Apple's most capable model is measured against. Apple's public position, stated by Federighi at the keynote, is that the AFM 3 models contain none of the Gemini models deployed to Google's own customers, none of Google's client-side code, and no Google Search infrastructure. Google's contribution is training-time refinement and hosting, not the model that answers your question.
Why Apple chose Google, and what it passed over
Apple did not default to Google. It evaluated proposals from OpenAI and Anthropic before selecting Gemini, according to reporting on the deal. The custom Gemini model is about eight times larger than Apple's previous 150-billion-parameter cloud model and uses a mixture-of-experts design, so it keeps the knowledge of a 1.2-trillion-parameter system while activating only a fraction of it per request, which is what lets the new Siri answer close to real time. The lesson for an enterprise is the bake-off, not the winner. Apple treated frontier models as substitutable inputs and chose on capability, latency and commercial terms, not loyalty. Your model layer should be swappable the same way, so that switching providers is a configuration change, not a rebuild.
The five AFM 3 models, plus Gemini
| Model | Size and design | Where it runs | Best for |
|---|---|---|---|
| AFM 3 Core | 3B dense | On device | Fast, private, offline tasks |
| AFM 3 Core Advanced | 20B sparse, 1-4B active | On device | Multimodal, dictation, voices |
| AFM 3 Cloud | Server model | Private Cloud Compute | Latency-optimised cloud tasks |
| ADM 3 Cloud | Server model | Private Cloud Compute | Image generation and editing |
| AFM 3 Cloud Pro | Most capable server model | Nvidia GPUs in Google Cloud | Agentic tool use, complex reasoning |
| Gemini frontier | Google's frontier family | Google Cloud | Refines AFM 3 Cloud Pro; reported Siri assist |
The on-device standout is AFM 3 Core Advanced. It holds 20 billion parameters but activates only 1 to 4 billion per request through a sparse design Apple calls Instruction-Following Pruning, a technique it published a year before shipping it. The effect is frontier-style capacity at the compute cost of a much smaller model, which is why it can run on a phone. That same sparse-activation idea, getting large-model quality at small-model cost, is the single most portable lesson here for cost-conscious enterprise teams.
AFM Cloud Pro versus Gemini frontier, head to head
The headline matchup is Apple's most capable owned model against the frontier model that trained it. They are not really competitors in Apple's stack; they are layers. But for an enterprise choosing between an owned, private model and a frontier API, the contrast is exactly the decision you face.
| Dimension | AFM 3 Cloud Pro | Gemini frontier |
|---|---|---|
| Controlled by | Apple (owned weights) | Google (API access) |
| Runs on | Nvidia Blackwell in Google Cloud, under PCC | Google Cloud |
| Privacy model | Confidential computing, stateless, attested | Standard cloud API terms |
| Best for | Agentic tool use, reasoning, with privacy | Maximum raw capability |
| Enterprise analogue | A distilled in-house model you run privately | A frontier model you call as a service |
Apple's choice tells you its priorities. It accepted a dependency on Google's cloud and Nvidia's chips to get capability and speed, but it refused to let user data leave a verifiable trust boundary. AFM 3 Cloud Pro is, in effect, a private, owned model with quality "similar to Gemini frontier models," per Apple's own framing, that it can run without handing Google the data. For a deeper look at this private-cloud pattern, see our analysis of Apple's AFM Cloud Pro and Private Cloud Compute.
How Apple kept the privacy promise on someone else's cloud
This is the part enterprise architects should copy. Apple required Nvidia's GPUs to be configured so they could not read the contents of Apple's servers, and built a layered hardware trust stack to enforce it. Per Nvidia and Apple's Private Cloud Compute expansion post, the design combines three hardware roots of trust: Nvidia Confidential Computing on Blackwell GPUs, Intel CPUs with Trust Domain Extensions, and Google's Titan security chip.
The mechanism is a trusted execution environment plus cryptographic attestation. Before any data is sent, the client verifies that the server is running the expected, untampered software; the workload is isolated so the cloud operator, in this case Google, cannot see inside it; and the system holds no state after the request. Apple extended this to third-party infrastructure for the first time, with the rollout reaching full capacity by the end of summer 2026. The principle is verifiable privacy: not "trust us," but "check the attestation." Enterprises moving sensitive workloads to GPU clouds can demand the same, and our notes on privacy-first AI architecture walk through how.
The orchestration and developer layer
What makes five models behave like one product is the orchestrator. It decides which tier handles which part of a request, keeping a simple query on an on-device model and escalating reasoning or tool use to AFM 3 Cloud Pro. Apple exposes this to developers through an expanded Foundation Models framework, so apps can build multimodal, agentic and model-flexible features and let Siri discover their capabilities through App Intents. Apple's stated infrastructure goal was to use Nvidia's newest hardware while extending Private Cloud Compute to a third-party cloud for the first time, handling demanding workloads such as agentic tool use and complex reasoning. For enterprises, the orchestrator is the part to build first. The models will change every few months; the routing logic that matches each task to the right tier, with cost and data sensitivity as inputs, is the durable asset that survives those swaps.
The privacy promise, questioned
The move drew scrutiny. Commentators asked whether running Apple Intelligence on Google and Nvidia hardware weakens the privacy promise Apple spent years building. Apple's answer is that the promise was never about who owns the data centre; it was about whether anyone other than the user can read the data, and attestation lets a device verify that no one can before it sends anything. That reframing, from "trust the provider" to "verify the infrastructure," is the part worth importing into your own vendor reviews. It changes the question from where a workload runs to what can be proven about how it runs.
What this means for your enterprise AI stack
Strip away the Apple branding and you have a reference architecture for regulated, privacy-sensitive AI in 2026.
| Pattern | What Apple did | Enterprise takeaway |
|---|---|---|
| Tiered model routing | On-device, to private cloud, to most-capable model | Route by task sensitivity, complexity and cost |
| Distill from a frontier model | Refined AFM with Gemini outputs | Use a frontier model to train a smaller owned one |
| Confidential computing | Nvidia CC, Intel TDX, Titan, attestation | Make attestation a procurement requirement |
| Multi-cloud, multi-vendor | Apple model on Google Cloud and Nvidia | Avoid single-model and single-cloud lock-in |
| Sparse activation | 1-4B active of 20B parameters | Cut inference cost without losing capability |
The strategic message is that owning your model and renting your capability are no longer mutually exclusive. Apple distilled a frontier model into a smaller one it controls, ran it on the best available hardware regardless of vendor, and wrapped the whole thing in attestable privacy. A bank, hospital or government contractor can follow the same three moves: distill a specialised model from a frontier API, host it under confidential computing, and route only the hardest, least sensitive queries to the frontier model directly. The orchestration layer that decides which tier handles which request is where this architecture lives, much like the enterprise AI agent governance layers that route and constrain agent actions.
There is a caution in the case study too. Apple now depends on Google and Nvidia for its flagship feature, a concentration of suppliers that did not exist a year ago. Multi-vendor architecture reduces model lock-in but adds operational dependency. Price that dependency into your own build-versus-buy math.
India-specific considerations
For Indian enterprises, two points carry over directly. First, confidential computing and attestation give a concrete, technical answer to data-residency and privacy duties under the DPDP regime: a workload that runs in a verifiable trusted execution environment, holds no state, and can prove what code processed the data is far easier to defend than a plain cloud API call. Second, the distillation pattern fits Indian cost structures well. Training or fine-tuning a smaller, owned model from a frontier API, then serving it on confidential GPU instances, controls both rupee inference cost and exposure, which matters more where budgets are tighter and data-localisation expectations are rising. The Apple stack is a blueprint, not a product you buy, and the moves scale down to a mid-size Indian SaaS or fintech team.
FAQ
How eCorpIT can help
eCorpIT is a Gurugram-based technology organisation with senior-led engineering teams that design privacy-first, multi-model AI stacks for enterprises. We help architects choose between owned and frontier models, build distillation and tiered-routing pipelines, and deploy workloads under confidential computing with attestation, the same pattern Apple used to extend Private Cloud Compute to Google and Nvidia. Founded in 2021 and assessed at CMMI Level 5, we design applications aligned with DPDP and sectoral privacy requirements. To architect your cloud AI model strategy, contact our team.
References
_Last updated: 26 June 2026._