On this page · 11 sections
- What Apple actually shipped
- Decision 1: extend the trust boundary without lowering it
- Decision 2: root trust in confidential computing, not the contract
- Decision 3: distrust the platform by default
- Decision 4: make inference stateless and ephemeral
- Decision 5: tier the models, and keep verifiable transparency
- What this means for your own private-cloud AI inference
- India-specific considerations
- FAQ
- How eCorpIT can help
- References
Summary. On June 8, 2026, Apple did something it had never done: it ran its private AI inference on someone else's data center. AFM 3 Cloud Pro, the most capable server model in the third generation of Apple Foundation Models at roughly 1.2 trillion parameters, runs on Nvidia Blackwell GPUs inside Google Cloud, wrapped in Apple's Private Cloud Compute (PCC). Apple introduced PCC in 2024 on its own silicon; this is the first time those privacy commitments extend to third-party data centers, per Apple Security Research. The model family spans on-device, including a roughly 20-billion-parameter model, up to AFM 3 Cloud Pro for agentic tool use and complex reasoning. Apple built these models with Google, using the technology behind Gemini, under a reported $1-billion-a-year deal, yet kept its own privacy guarantees intact. The implementation combines Nvidia Confidential Computing, Intel CPUs with TDX, and Google's Titan chip. For cloud architects and CTOs evaluating private-cloud AI inference, the five decisions below are a reference design for running sensitive inference on hardware you do not own.
You will not rebuild Apple's scale. But the patterns, confidential computing, statelessness, attestation, model tiering, and verifiable transparency, are exactly the ones any private-AI stack now has to weigh.
What Apple actually shipped
The short version: Apple needed more compute than its own data centers could supply for the hardest Apple Intelligence tasks, and it refused to lower its privacy bar to get it. So it extended PCC to Google Cloud on Nvidia GPUs, per the Apple Security Research blog and Nvidia. Apple keeps complete control of the PCC software, and Apple devices will only trust PCC software that Apple has cryptographically approved, regardless of whose data center it runs in. The rollout ramps through a summer 2026 preview.
The table maps the five architecture decisions this article covers.
| Decision | What Apple did | Why it matters for your stack |
|---|---|---|
| 1. Extend the trust boundary | Ran PCC on Google Cloud, same guarantees | Borrow capacity without lowering security |
| 2. Root trust in confidential computing | Nvidia CC, Intel TDX, Google Titan | Encrypt data in use, not just at rest |
| 3. Distrust the platform by default | Whole stack in the trusted computing base | Assume confidential computing alone is not enough |
| 4. Make inference stateless and ephemeral | No retention, short time-to-live, isolation | Reduce the blast radius of any breach |
| 5. Tier models and stay verifiable | On-device plus cloud, public binaries | Route by task and prove your claims |
Decision 1: extend the trust boundary without lowering it
The first decision is the headline one. Apple needed Nvidia GPU capacity at a scale its own data centers could not meet for AFM 3 Cloud Pro, and the obvious path, just renting cloud GPUs, would have meant trusting a third party with user data. Apple's answer was to extend Private Cloud Compute itself to Google Cloud, carrying its guarantees with it, rather than accepting a weaker posture in exchange for capacity. As Apple put it, this extends "our industry-leading PCC privacy commitments to third-party data centers for the first time."
The architectural lesson for a CTO is the framing, not the scale. When you outgrow your own infrastructure for sensitive AI inference, the choice is not "build versus rent." It is whether you can carry your security model onto rented hardware unchanged. Apple's bet is that you can, if the trust is rooted in hardware and verified cryptographically rather than assumed from a vendor contract. Apple also retains complete control of the PCC software stack, so location changed while ownership of the security boundary did not.
Decision 2: root trust in confidential computing, not the contract
The second decision is what makes the first one possible. Instead of trusting Google's operational controls, PCC on Google Cloud roots its trust in hardware confidential computing. The implementation uses Nvidia Confidential Computing on Nvidia Blackwell GPUs, Intel CPUs with TDX, and Google's Titan chip, per Apple Security Research. Nvidia Confidential Computing isolates the workload in a trusted execution environment and protects data while it is being processed, and it uses remote attestation so the system can cryptographically verify the hardware has not been tampered with before any sensitive data is sent, as Nvidia describes.
| Layer | Component | Role |
|---|---|---|
| GPU | Nvidia Blackwell with Confidential Computing | Isolates inference in a trusted execution environment |
| CPU | Intel CPUs with TDX | Extends confidential computing to the host |
| Security chip | Google Titan | Hardware root of trust in the data center |
| Attestation | Two independent roots of trust | Verifies the platform before keys are released |
| Hardware ledger | Append-only, cryptographic | Tracks every machine in the PCC fleet |
The takeaway is that "encrypted at rest and in transit" is no longer the bar for sensitive inference. Data has to be protected while it is in use, inside the GPU and CPU, and the platform has to prove its integrity by attestation before it ever sees the data. Confidential computing moved from a niche feature to the foundation of the design.
Decision 3: distrust the platform by default
The third decision is the most instructive, because it assumes the second one is not enough. Apple states plainly that it does "not rely solely on confidential computing technologies" to stop attacks that use privileged access outside a confidential VM, including side-channel attacks. Instead, it treats every component, from firmware through the host and guest operating systems to the application code, as part of its trusted computing base, all subject to verifiable transparency and no-privileged-access guarantees.
Two mechanisms back this up. To blunt supply-chain attacks, Apple keeps a cryptographically verifiable, append-only ledger of all Google Cloud hardware in the PCC fleet, and roots its software attestation in at least two separate roots of trust from independent vendors. The engineering judgement here is one worth copying: confidential computing is a strong primitive, but a design that trusts it alone has simply moved the single point of failure. The resilience comes from layering, from assuming the host is hostile and verifying everything anyway. For teams building this discipline into their own AI systems, our AI delivery lessons for 2026 cover the same defense-in-depth mindset.
Decision 4: make inference stateless and ephemeral
The fourth decision is about limiting damage. PCC's core requirements did not change when it moved to Google Cloud: stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency. The new environment had to satisfy all five, not a subset.
| PCC requirement | What it means | How it shows up in the design |
|---|---|---|
| Stateless computation | No user data persists after a request | Request data is not retained on the server |
| Enforceable guarantees | Promises are technical, not policy | Rooted in hardware and cryptography |
| No privileged runtime access | No operator can reach live data | Whole stack in the trusted computing base |
| Non-targetability | A specific user cannot be singled out | Requests are not routable to a chosen node |
| Verifiable transparency | Outsiders can check the claims | All binaries published for inspection |
In practice, Apple parses each request's initial network data in a dedicated process inside its own namespace, recycles shared inference software on a short time-to-live, and holds attested keys in a separate confidential VM isolated from external inputs. The pattern for any architect is the same: assume a breach will happen, and design so that when it does, there is almost nothing to steal, because nothing is retained, nothing is targetable, and every component is short-lived.
Decision 5: tier the models, and keep verifiable transparency
The fifth decision is the system design around the model. Apple did not build one giant cloud model and route everything to it. The third generation of Apple Foundation Models is a family that spans on-device, including a roughly 20-billion-parameter model that runs on the iPhone, up to AFM 3 Cloud Pro in the cloud for the most demanding agentic and reasoning tasks, as 9to5Mac and wccftech detail. Cheap, private, low-latency work stays on the device; only the hard tasks travel to the cloud, where the privacy machinery above protects them.
Two more choices complete the design. Transparency is enforced, not promised: Apple publishes all PCC binaries for public inspection and gives security researchers access to live PCC nodes in research mode through its bounty program. And ownership is retained despite the collaboration. Apple built the models with Google's Gemini technology under a reported $1-billion-a-year arrangement, but kept the stack its own. "The amount of the Google Assistant we use is none," said Craig Federighi, Apple's senior vice president of Software Engineering, per AppleInsider. The lesson is that you can use a partner's technology and third-party hardware and still own your security boundary, if you keep control of the software and prove it publicly.
What this means for your own private-cloud AI inference
Most organisations cannot replicate Apple's scale, but the design choices translate directly. If you run sensitive inference, the questions Apple answered are now yours. Can you carry your security model onto rented GPUs unchanged, or does cloud capacity force you to lower it? Is your data protected in use through confidential computing and attestation, or only at rest and in transit? Do you trust the platform, or do you verify it and assume the host is hostile? Is your inference path stateless and short-lived, so a breach finds little to take? And can an outsider verify your privacy claims, or are they just words in a policy?
Confidential computing on Nvidia GPUs is available to enterprises, not only to Apple, which means the pattern is buildable at a smaller scale. The hard part, as Apple's own work shows, is not turning on a confidential VM; it is designing the whole stack, attestation, statelessness, and transparency, so the guarantee holds when something goes wrong. For a longer view on building AI systems that survive production, our generative AI enterprise strategy guide covers where these decisions fit a roadmap.
India-specific considerations
For Indian enterprises, this architecture lands on a live compliance question. Under the Digital Personal Data Protection Act, 2023, the Data Fiduciary stays responsible for personal data even when a processor or cloud provider handles it, with penalties up to ₹250 crore per breach, per EY India. Running AI inference on third-party cloud GPUs is exactly the scenario the law scrutinises, because the personal data in a prompt is being processed outside your own walls.
The PCC pattern is a useful template here. Confidential computing keeps prompt data protected in use, statelessness means the provider retains nothing after the request, and attestation gives you a technical, checkable basis for the privacy claim you make to regulators and customers, rather than a contractual one. For an Indian business choosing where to run sensitive inference, the practical reading is to prefer confidential-computing-backed services, to design for no data retention, and to keep verifiable control of the software, so that "the cloud processes it" does not become "we lost control of it."
FAQ
How eCorpIT can help
eCorpIT is a senior-led, CMMI Level 5 technology organisation in Gurugram that designs and operates cloud and AI infrastructure for global and Indian businesses. We help architecture teams evaluate private-cloud AI inference: choosing confidential-computing-backed services, designing stateless inference paths, and keeping data handling aligned with DPDP requirements when models run on third-party GPUs. We work across AWS, Microsoft, Google, and Nvidia-accelerated platforms. To review your private-AI architecture, contact our team.
References
- Apple Security Research, "Expanding Private Cloud Compute," June 8, 2026.
- Apple Machine Learning Research, "Introducing the Third Generation of Apple's Foundation Models," June 2026.
- Nvidia, "Nvidia Confidential Computing to Help Expand Apple's Private Cloud Compute," June 9, 2026.
- Data Center Dynamics, "Apple's Private Cloud Compute to run on Google Cloud," June 2026.
- 9to5Mac, "Apple's third-generation Foundation Models explained," June 11, 2026.
- AppleInsider, "Apple's new foundation models don't contain a drop of Gemini," June 8, 2026.
- Wccftech, "Apple removes the fog around its new cloud-based and 20-billion-parameter on-device AI models," June 2026.
- CryptoBriefing, "Nvidia expands Confidential Computing for Apple's Private Cloud Compute on Google Cloud at WWDC26," June 2026.
- Business Standard, "Apple AI now runs on Google, Nvidia tech: What happens to the privacy promise," June 12, 2026.
- MacStories, "The Third Generation of Apple's Foundation Models and AFM Core Advanced," June 2026.
- EY India, "Decoding the Digital Personal Data Protection Act, 2023," 2026.
- CNBC, "Apple picks Google's Gemini to run AI-powered Siri," January 12, 2026.
_Last updated: June 24, 2026._