5 architecture decisions behind AFM 3 Cloud Pro on Nvidia GPUs (2026)

Apple runs AFM 3 Cloud Pro on Nvidia Blackwell GPUs in Google Cloud under Private Cloud Compute. Five architecture decisions for private-cloud AI inference.

Read time
12 min
Word count
1.8K
Sections
11
FAQs
8
Share
Abstract 3D GPU sealed in a glowing glass vault with encrypted data streams
Private Cloud Compute extends confidential AI inference to third-party GPUs.
On this page · 11 sections
  1. What Apple actually shipped
  2. Decision 1: extend the trust boundary without lowering it
  3. Decision 2: root trust in confidential computing, not the contract
  4. Decision 3: distrust the platform by default
  5. Decision 4: make inference stateless and ephemeral
  6. Decision 5: tier the models, and keep verifiable transparency
  7. What this means for your own private-cloud AI inference
  8. India-specific considerations
  9. FAQ
  10. How eCorpIT can help
  11. References

Summary. On June 8, 2026, Apple did something it had never done: it ran its private AI inference on someone else's data center. AFM 3 Cloud Pro, the most capable server model in the third generation of Apple Foundation Models at roughly 1.2 trillion parameters, runs on Nvidia Blackwell GPUs inside Google Cloud, wrapped in Apple's Private Cloud Compute (PCC). Apple introduced PCC in 2024 on its own silicon; this is the first time those privacy commitments extend to third-party data centers, per Apple Security Research. The model family spans on-device, including a roughly 20-billion-parameter model, up to AFM 3 Cloud Pro for agentic tool use and complex reasoning. Apple built these models with Google, using the technology behind Gemini, under a reported $1-billion-a-year deal, yet kept its own privacy guarantees intact. The implementation combines Nvidia Confidential Computing, Intel CPUs with TDX, and Google's Titan chip. For cloud architects and CTOs evaluating private-cloud AI inference, the five decisions below are a reference design for running sensitive inference on hardware you do not own.

You will not rebuild Apple's scale. But the patterns, confidential computing, statelessness, attestation, model tiering, and verifiable transparency, are exactly the ones any private-AI stack now has to weigh.

What Apple actually shipped

The short version: Apple needed more compute than its own data centers could supply for the hardest Apple Intelligence tasks, and it refused to lower its privacy bar to get it. So it extended PCC to Google Cloud on Nvidia GPUs, per the Apple Security Research blog and Nvidia. Apple keeps complete control of the PCC software, and Apple devices will only trust PCC software that Apple has cryptographically approved, regardless of whose data center it runs in. The rollout ramps through a summer 2026 preview.

The table maps the five architecture decisions this article covers.

Decision What Apple did Why it matters for your stack
1. Extend the trust boundary Ran PCC on Google Cloud, same guarantees Borrow capacity without lowering security
2. Root trust in confidential computing Nvidia CC, Intel TDX, Google Titan Encrypt data in use, not just at rest
3. Distrust the platform by default Whole stack in the trusted computing base Assume confidential computing alone is not enough
4. Make inference stateless and ephemeral No retention, short time-to-live, isolation Reduce the blast radius of any breach
5. Tier models and stay verifiable On-device plus cloud, public binaries Route by task and prove your claims

Decision 1: extend the trust boundary without lowering it

The first decision is the headline one. Apple needed Nvidia GPU capacity at a scale its own data centers could not meet for AFM 3 Cloud Pro, and the obvious path, just renting cloud GPUs, would have meant trusting a third party with user data. Apple's answer was to extend Private Cloud Compute itself to Google Cloud, carrying its guarantees with it, rather than accepting a weaker posture in exchange for capacity. As Apple put it, this extends "our industry-leading PCC privacy commitments to third-party data centers for the first time."

The architectural lesson for a CTO is the framing, not the scale. When you outgrow your own infrastructure for sensitive AI inference, the choice is not "build versus rent." It is whether you can carry your security model onto rented hardware unchanged. Apple's bet is that you can, if the trust is rooted in hardware and verified cryptographically rather than assumed from a vendor contract. Apple also retains complete control of the PCC software stack, so location changed while ownership of the security boundary did not.

Decision 2: root trust in confidential computing, not the contract

The second decision is what makes the first one possible. Instead of trusting Google's operational controls, PCC on Google Cloud roots its trust in hardware confidential computing. The implementation uses Nvidia Confidential Computing on Nvidia Blackwell GPUs, Intel CPUs with TDX, and Google's Titan chip, per Apple Security Research. Nvidia Confidential Computing isolates the workload in a trusted execution environment and protects data while it is being processed, and it uses remote attestation so the system can cryptographically verify the hardware has not been tampered with before any sensitive data is sent, as Nvidia describes.

Layer Component Role
GPU Nvidia Blackwell with Confidential Computing Isolates inference in a trusted execution environment
CPU Intel CPUs with TDX Extends confidential computing to the host
Security chip Google Titan Hardware root of trust in the data center
Attestation Two independent roots of trust Verifies the platform before keys are released
Hardware ledger Append-only, cryptographic Tracks every machine in the PCC fleet

The takeaway is that "encrypted at rest and in transit" is no longer the bar for sensitive inference. Data has to be protected while it is in use, inside the GPU and CPU, and the platform has to prove its integrity by attestation before it ever sees the data. Confidential computing moved from a niche feature to the foundation of the design.

Decision 3: distrust the platform by default

The third decision is the most instructive, because it assumes the second one is not enough. Apple states plainly that it does "not rely solely on confidential computing technologies" to stop attacks that use privileged access outside a confidential VM, including side-channel attacks. Instead, it treats every component, from firmware through the host and guest operating systems to the application code, as part of its trusted computing base, all subject to verifiable transparency and no-privileged-access guarantees.

Two mechanisms back this up. To blunt supply-chain attacks, Apple keeps a cryptographically verifiable, append-only ledger of all Google Cloud hardware in the PCC fleet, and roots its software attestation in at least two separate roots of trust from independent vendors. The engineering judgement here is one worth copying: confidential computing is a strong primitive, but a design that trusts it alone has simply moved the single point of failure. The resilience comes from layering, from assuming the host is hostile and verifying everything anyway. For teams building this discipline into their own AI systems, our AI delivery lessons for 2026 cover the same defense-in-depth mindset.

Decision 4: make inference stateless and ephemeral

The fourth decision is about limiting damage. PCC's core requirements did not change when it moved to Google Cloud: stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency. The new environment had to satisfy all five, not a subset.

PCC requirement What it means How it shows up in the design
Stateless computation No user data persists after a request Request data is not retained on the server
Enforceable guarantees Promises are technical, not policy Rooted in hardware and cryptography
No privileged runtime access No operator can reach live data Whole stack in the trusted computing base
Non-targetability A specific user cannot be singled out Requests are not routable to a chosen node
Verifiable transparency Outsiders can check the claims All binaries published for inspection

In practice, Apple parses each request's initial network data in a dedicated process inside its own namespace, recycles shared inference software on a short time-to-live, and holds attested keys in a separate confidential VM isolated from external inputs. The pattern for any architect is the same: assume a breach will happen, and design so that when it does, there is almost nothing to steal, because nothing is retained, nothing is targetable, and every component is short-lived.

Decision 5: tier the models, and keep verifiable transparency

The fifth decision is the system design around the model. Apple did not build one giant cloud model and route everything to it. The third generation of Apple Foundation Models is a family that spans on-device, including a roughly 20-billion-parameter model that runs on the iPhone, up to AFM 3 Cloud Pro in the cloud for the most demanding agentic and reasoning tasks, as 9to5Mac and wccftech detail. Cheap, private, low-latency work stays on the device; only the hard tasks travel to the cloud, where the privacy machinery above protects them.

Two more choices complete the design. Transparency is enforced, not promised: Apple publishes all PCC binaries for public inspection and gives security researchers access to live PCC nodes in research mode through its bounty program. And ownership is retained despite the collaboration. Apple built the models with Google's Gemini technology under a reported $1-billion-a-year arrangement, but kept the stack its own. "The amount of the Google Assistant we use is none," said Craig Federighi, Apple's senior vice president of Software Engineering, per AppleInsider. The lesson is that you can use a partner's technology and third-party hardware and still own your security boundary, if you keep control of the software and prove it publicly.

What this means for your own private-cloud AI inference

Most organisations cannot replicate Apple's scale, but the design choices translate directly. If you run sensitive inference, the questions Apple answered are now yours. Can you carry your security model onto rented GPUs unchanged, or does cloud capacity force you to lower it? Is your data protected in use through confidential computing and attestation, or only at rest and in transit? Do you trust the platform, or do you verify it and assume the host is hostile? Is your inference path stateless and short-lived, so a breach finds little to take? And can an outsider verify your privacy claims, or are they just words in a policy?

Confidential computing on Nvidia GPUs is available to enterprises, not only to Apple, which means the pattern is buildable at a smaller scale. The hard part, as Apple's own work shows, is not turning on a confidential VM; it is designing the whole stack, attestation, statelessness, and transparency, so the guarantee holds when something goes wrong. For a longer view on building AI systems that survive production, our generative AI enterprise strategy guide covers where these decisions fit a roadmap.

India-specific considerations

For Indian enterprises, this architecture lands on a live compliance question. Under the Digital Personal Data Protection Act, 2023, the Data Fiduciary stays responsible for personal data even when a processor or cloud provider handles it, with penalties up to ₹250 crore per breach, per EY India. Running AI inference on third-party cloud GPUs is exactly the scenario the law scrutinises, because the personal data in a prompt is being processed outside your own walls.

The PCC pattern is a useful template here. Confidential computing keeps prompt data protected in use, statelessness means the provider retains nothing after the request, and attestation gives you a technical, checkable basis for the privacy claim you make to regulators and customers, rather than a contractual one. For an Indian business choosing where to run sensitive inference, the practical reading is to prefer confidential-computing-backed services, to design for no data retention, and to keep verifiable control of the software, so that "the cloud processes it" does not become "we lost control of it."

FAQ

How eCorpIT can help

eCorpIT is a senior-led, CMMI Level 5 technology organisation in Gurugram that designs and operates cloud and AI infrastructure for global and Indian businesses. We help architecture teams evaluate private-cloud AI inference: choosing confidential-computing-backed services, designing stateless inference paths, and keeping data handling aligned with DPDP requirements when models run on third-party GPUs. We work across AWS, Microsoft, Google, and Nvidia-accelerated platforms. To review your private-AI architecture, contact our team.

References

  1. Apple Security Research, "Expanding Private Cloud Compute," June 8, 2026.
  1. Apple Machine Learning Research, "Introducing the Third Generation of Apple's Foundation Models," June 2026.
  1. Nvidia, "Nvidia Confidential Computing to Help Expand Apple's Private Cloud Compute," June 9, 2026.
  1. Data Center Dynamics, "Apple's Private Cloud Compute to run on Google Cloud," June 2026.
  1. MLQ, "Apple Extends Private Cloud Compute to Google Cloud on Nvidia Blackwell GPUs," June 2026.
  1. 9to5Mac, "Apple's third-generation Foundation Models explained," June 11, 2026.
  1. AppleInsider, "Apple's new foundation models don't contain a drop of Gemini," June 8, 2026.
  1. Wccftech, "Apple removes the fog around its new cloud-based and 20-billion-parameter on-device AI models," June 2026.
  1. CryptoBriefing, "Nvidia expands Confidential Computing for Apple's Private Cloud Compute on Google Cloud at WWDC26," June 2026.
  1. Business Standard, "Apple AI now runs on Google, Nvidia tech: What happens to the privacy promise," June 12, 2026.
  1. MacStories, "The Third Generation of Apple's Foundation Models and AFM Core Advanced," June 2026.
  1. EY India, "Decoding the Digital Personal Data Protection Act, 2023," 2026.
  1. CNBC, "Apple picks Google's Gemini to run AI-powered Siri," January 12, 2026.

_Last updated: June 24, 2026._

Frequently asked

Quick answers.

01 What is AFM 3 Cloud Pro?
AFM 3 Cloud Pro is the most capable server-based model in the third generation of Apple Foundation Models, at roughly 1.2 trillion parameters. It powers the most demanding Apple Intelligence tasks, such as agentic tool use and complex reasoning, and runs on Nvidia Blackwell GPUs inside Google Cloud, wrapped in Apple's Private Cloud Compute infrastructure.
02 Why did Apple move Private Cloud Compute to Google Cloud?
Apple needed more GPU capacity than its own data centers could supply for its most demanding AI tasks. Rather than lower its privacy bar to rent cloud hardware, it extended Private Cloud Compute to Google Cloud on Nvidia GPUs, carrying the same guarantees. It is the first time Apple's PCC privacy commitments reach third-party data centers.
03 How does confidential computing protect data in AFM 3 Cloud Pro?
Nvidia Confidential Computing isolates the inference workload in a trusted execution environment, protecting data while it is processed, not only at rest or in transit. Remote attestation lets the system cryptographically verify the hardware has not been tampered with before any sensitive data is sent. Apple adds Intel TDX CPUs and Google's Titan chip.
04 What are Private Cloud Compute's core requirements?
Apple lists five, unchanged in the move to Google Cloud: stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency. Together they ensure user data does not persist, promises are enforced in hardware rather than policy, no operator can reach live data, no specific user can be targeted, and outsiders can verify the claims.
05 Does Apple's AI now run on Google's models?
Apple built its foundation models using the technology behind Google's Gemini under a reported billion-dollar-a-year deal, but says the shipping system is its own. Craig Federighi stated, "The amount of the Google Assistant we use is none." Apple keeps complete control of the Private Cloud Compute software, which Apple devices must cryptographically approve before trusting.
06 Can other companies use this architecture?
Yes, at a smaller scale. Nvidia Confidential Computing is available to enterprises, so the core pattern, confidential inference with attestation, statelessness, and verifiable transparency, is buildable beyond Apple. The hard part is not enabling a confidential VM; it is designing the whole stack so the privacy guarantee still holds when a component is attacked or compromised.
07 How does this relate to India's DPDP Act?
Under the DPDP Act, 2023, the Data Fiduciary remains responsible for personal data even when a cloud provider processes it, with penalties up to ₹250 crore per breach. Running inference on third-party GPUs is exactly this scenario, so confidential computing, no data retention, and attestation give a technical basis for the privacy claims the law expects.
08 What is the difference between on-device and cloud Apple Foundation Models?
The third generation spans tiers. An on-device model of roughly 20 billion parameters handles fast, private, low-latency tasks directly on the iPhone, with no data leaving the device. AFM 3 Cloud Pro, at around 1.2 trillion parameters, handles the hardest agentic and reasoning tasks in the cloud, where Private Cloud Compute protects the data being processed.

About the author

Manu Shukla

Founder & Director

Founder of eCorpIT. Hands-on engineer leading senior-only delivery for AI apps, custom software, and cloud systems for global clients.

Subscribe

One engineering note a week. No fluff, no spam.

Senior-architect playbooks on AI agents, mobile apps, cloud, security, data, and marketing — delivered every Wednesday.

Past the reading

Read enough. Let's build something.

A senior architect responds in 24 working hours with scope, indicative cost, and a timeline. NDA before any technical conversation.