AFM Cloud Pro on Nvidia GPUs: what it means for enterprise in 2026

Apple's AFM 3 Cloud Pro runs on Nvidia GPUs in Google Cloud under Private Cloud Compute. What the architecture means for enterprise AI in 2026.

Read time: 14 min
Word count: 2.4K
Sections: 14
FAQs: 8

By Manu Shukla

Founder & Director June 22, 2026 Updated Jun 23, 2026

Frontier inference on confidential GPUs.

On this page · 14 sections

What AFM Cloud Pro is
How the trust model works
The enterprise gap: individual privacy is not enterprise observability
PCC is not the same as confidential computing
Build versus buy for your stack
A reference architecture for confidential inference
Questions to ask a confidential cloud vendor
Migrating a workload to confidential inference
The cost of getting it wrong
India-specific considerations
What to take from it
FAQ
How eCorpIT can help
References

Summary. AFM 3 Cloud Pro, the most capable model in Apple's third-generation Foundation Models, runs on Nvidia Blackwell GPUs inside Google Cloud rather than on Apple silicon. Announced at WWDC 2026 on June 8, it handles agentic tool use and complex reasoning at a quality Apple compares to Gemini frontier models, ramping to full capacity by the end of summer 2026. Apple held its privacy bar by extending Private Cloud Compute to Google Cloud with Nvidia Confidential Computing, Intel CPUs with TDX, and Google's Titan chip, configured so the GPUs cannot read Apple's servers. For enterprises evaluating frontier inference on rented confidential GPUs, which run from $2.12 to $14.24 per GPU-hour as of June 2026, this is the clearest blueprint yet, and also a warning: PCC optimises for individual non-targetability, not the observability and data residency an enterprise needs.

The context is a confidential computing market that Gartner places among its core 2026 architecture trends, projected past $42.74 billion in 2026 and expected to secure more than 75% of processing in untrusted infrastructure by 2029. Apple just gave that market its highest-profile production deployment. This article reads AFM Cloud Pro as an enterprise architect would: what it is, how the trust model works, and the specific places where Apple's individual-privacy design does not map onto enterprise requirements. It is the enterprise companion to our architecture breakdown of Apple Intelligence on Nvidia GPUs.

What AFM Cloud Pro is

AFM 3 Cloud Pro is the top tier of Apple's model stack. The on-device models handle routine work, AFM 3 Cloud carries the main server workload, and AFM 3 Cloud Pro is reserved for the hardest tasks: agentic tool use and complex reasoning. Apple runs it on Nvidia GPUs hosted in Google Cloud, and describes its quality in the range of Gemini frontier models, while maintaining that the model is its own, refined using Gemini outputs rather than served by Gemini, per CIO Dive and CNBC. Apple AI vice president Amar Subramanya said the models are "custom builds for Apple Silicon, trained using proprietary data, and refined using outputs from Gemini frontier models."

The notable part for an enterprise is the substrate. A company famous for owning its whole stack chose to run its most demanding model on another company's GPUs in a third company's data center. It did so because the model was too demanding for its own hardware at the needed speed, and Nvidia capacity on Google Cloud was the path that shipped. That decision is the one most enterprises also face: own the metal, or rent confidential capacity that can prove it is trustworthy.

How the trust model works

Apple did not extend trust to Google or Nvidia as companies. It extended trust to attested hardware states. PCC on Google Cloud combines Nvidia Confidential Computing on the GPUs, Intel CPUs with TDX, and Google's Titan security chip, as Apple's Expanding Private Cloud Compute post sets out. The enabling detail, reported around the launch, is that Apple required Nvidia's chips to be configured so they could not read the contents of Apple's servers, which a recent Nvidia confidential-compute capability made possible.

Layer	Component	Enterprise question it raises
GPU compute	Nvidia Blackwell with Confidential Computing	Can you attest the GPU before sending data?
Cloud platform	Google Cloud	Who operates the host, and under what terms?
CPU	Intel CPUs with TDX	Is the VM isolated from the host operator?
Root of trust	Google Titan security chip	How many independent roots back attestation?
Verification	Append-only transparency log	Can an outside party check what ran?
Routing	On-device orchestrator	When does a request leave the device?

The pattern is reusable. Confidential GPUs, TDX-class CPU isolation, and hardware roots of trust are available to any enterprise on current cloud hardware. As Apple's senior vice president of software engineering, Craig Federighi, said, the orchestrator that decides what leaves the device is "key to the privacy architecture of our entire system." The lesson is that attestation, not a contract, is what should gate sensitive data leaving your boundary.

The enterprise gap: individual privacy is not enterprise observability

Here is where an enterprise architect should slow down. PCC is the most advanced privacy-preserving cloud AI architecture deployed at scale, but it was built for individual user privacy, not for enterprise observability, and the two pull in opposite directions. Apple deliberately does not disclose the physical location of PCC nodes, because non-targetability, the property that stops an attacker steering a chosen user to a compromised machine, depends on that opacity. The New Stack and enterprise-focused coverage flag the consequence plainly: Apple will not confirm what country a given request touches, which is a direct problem for any business with data-residency obligations, as SimpleMDM's enterprise analysis notes.

Routing is the second gap. Simpler requests stay on-device and complex agentic tasks route to AFM 3 Cloud Pro, but Apple has not publicly specified exactly when a request offloads or whether that decision is visible to the developer. For a consumer that is fine. For an enterprise that must document where inference ran, an opaque routing boundary is a compliance gap, not a convenience.

Requirement	Apple PCC provides	An enterprise also needs
Data confidentiality in use	Yes, via confidential computing	Yes
Non-targetability	Yes, by hiding node location	Often conflicts with residency
Data residency proof	No, location is undisclosed	Yes, for regulated data
Routing visibility	Not specified	Yes, to document inference location
Per-tenant audit trail	Not designed for it	Yes, for enterprise compliance

The takeaway is not that PCC is weak. It is that PCC solves a different problem than enterprise AI governance. Borrow its trust model, and add the observability and residency controls it intentionally leaves out.

PCC is not the same as confidential computing

A common confusion is worth clearing up, because it changes how you plan. Apple's Private Cloud Compute and generic confidential computing address different threat models. Confidential computing protects data in use from the infrastructure operator, using a trusted execution environment and attestation. PCC builds on that but adds Apple-specific properties: statelessness, no privileged runtime access, non-targetability, and verifiable transparency across the whole stack, as The New Stack's comparison explains. For an enterprise, that means adopting confidential computing gives you the in-use protection, but the verifiability and statelessness are extra engineering you have to add yourself.

Build versus buy for your stack

The decision AFM Cloud Pro crystallises is build versus rent for confidential frontier inference. Three options exist. You can run models on your own confidential hardware, which gives maximum control and residency proof at the highest cost and effort. You can rent confidential GPU capacity on a public cloud, the path Apple took, which gets you frontier compute with attestation while leaving residency and observability to you. Or you can call a hosted model API, which is cheapest to start and weakest on data control. Most regulated enterprises in 2026 land on the middle option, then add the controls Apple omits.

If you rent confidential capacity, copy four things from Apple's design and add two it leaves out. Copy attestation before data leaves your boundary, stateless inference with no retention, an append-only record of what ran, and a swappable model layer so you are not locked to one provider. Add explicit data-residency control, so you can prove what region processed a request, and per-request routing visibility, so you can document where inference ran. Those two additions are exactly the enterprise gaps in Apple's consumer design.

A reference architecture for confidential inference

Strip Apple's deployment to its parts and a reference architecture appears that any enterprise can adapt. At the edge sits a router, the equivalent of Apple's orchestrator, that classifies each request by sensitivity and decides whether it can be answered locally or in your own data center, or whether it may go to rented confidential capacity. Behind the router, a policy layer records the decision and the data classification, so every routed request has a logged rationale. The inference tier itself runs in a trusted execution environment, with the workload attested before any data is released to it. A key-management service holds attested keys in a separate, isolated environment, so the inference nodes never hold long-lived secrets. Finally, an audit layer writes an append-only record of what ran, against which software measurement, and for which tenant.

The difference from Apple's consumer design is in the policy and audit layers. Apple can keep those thin because its tenant is a single individual and its promise is non-targetability. An enterprise has many tenants, many data classes, and regulators who want evidence, so the policy and audit layers are where most of your engineering goes. Treat them as first-class services, not log files, because they are what turns a confidential deployment into a defensible one.

One more component deserves emphasis: the model layer should be swappable. Apple routes on-device, Private Cloud Compute, and third-party clouds through a single call site, and the same discipline protects an enterprise from lock-in and from a single provider's outage or price change. Put the model behind an interface, so swapping a provider is a configuration change rather than a re-architecture. In a market where capability, price, and availability shift quarterly, the ability to move a workload to a different confidential provider without rewriting your governance is itself a control.

Questions to ask a confidential cloud vendor

Before you rent confidential capacity, the vendor conversation should be specific. Ask how attestation works and whether you, not just the cloud operator, can verify the hardware and firmware state before a workload runs. Ask which roots of trust back that attestation and whether they are independent, because a single vendor root is weaker than two. Ask whether you can pin a workload to a named region and prove afterwards which region processed a request, since that is the residency control Apple's design omits. Ask what the operator can and cannot see, in writing, and how that is enforced in hardware rather than policy. And ask how the confidential mode affects throughput, because on current hardware the overhead is small, so a large quoted penalty is a sign the configuration is wrong.

The reason to ask in this order is that the answers compound. Strong attestation with weak residency proof still fails a DPDP audit. Good residency with no audit trail still fails an incident review. You are buying a chain, and the chain is only as strong as its weakest documented link.

Migrating a workload to confidential inference

A sensible migration runs in phases rather than a single cutover. Start by classifying the data a workload touches, because the classification, not the model, decides where inference may run. Move the least sensitive, highest-volume work first, often to an on-device or in-region tier, which cuts cost and proves the routing layer before any regulated data is involved. Next, stand up the confidential cloud tier for the workloads that genuinely need a frontier model, with attestation and audit wired in from the first request rather than added later. Only then route regulated data through it, once residency pinning and the audit trail are demonstrably working.

The failure mode to avoid is the reverse order: shipping the model first and bolting governance on afterward. That is how organisations end up with a capable feature they cannot actually deploy in a regulated market, because they cannot prove where inference ran. Apple staged its own rollout across a summer preview for related reasons, ramping protections gradually rather than enabling everything at once. An enterprise migration deserves the same patience.

The cost of getting it wrong

The downside is not abstract. Under DPDP, a breach traced to inadequate safeguards can draw a penalty up to Rs 250 crore, and a confidential deployment you cannot audit offers little defence when a regulator asks where data went. The confidential computing market is growing past 42.74 billion dollars in 2026 precisely because enterprises have concluded that in-use protection plus verifiable evidence is cheaper than the alternative. The architecture is available; the gap is usually in the policy, residency, and audit layers that a consumer design like Apple's never needed.

India-specific considerations

For Indian enterprises, the residency gap is the headline. Under the Digital Personal Data Protection Act 2023 (DPDP), which carries penalties up to ₹250 crore for inadequate safeguards, you may need to show where personal data was processed. Apple's non-targetability model, which hides node location by design, is the opposite of what a DPDP audit wants, so an enterprise borrowing the pattern must add the region pinning Apple omits. The practical route is to keep sensitive Indian data on-device or in an approved in-country region, and reserve rented confidential cloud capacity for data that is allowed to leave.

The cost math also lands in rupees. Confidential GPU time at a few dollars per GPU-hour adds up quickly at enterprise volume, so the on-device-first routing that Apple uses to cut traffic is a budget control as much as a privacy one. We design and build confidential and hybrid AI architectures aligned with DPDP requirements across AWS, Microsoft, and Google. For the strategic frame, see our guide to generative AI enterprise strategy.

What to take from it

AFM Cloud Pro proves that a frontier model can run on rented, attested GPUs without surrendering a strong privacy claim, which is the reusable win. It also shows the limits of a consumer privacy design when you drop it into an enterprise: no residency proof, no routing visibility, no per-tenant audit trail, because those were never the goal. The enterprises that benefit will treat Apple's architecture as a foundation, take its hardware-rooted trust and statelessness, and build the observability and residency controls on top that their regulators, not their users, require.

FAQ

How eCorpIT can help

eCorpIT (eCorp Information Technologies Private Limited) is a Gurugram-based, CMMI Level 5 technology organisation whose senior engineering teams design confidential and hybrid AI infrastructure. We help CTOs and enterprise architects adopt attested, stateless inference, add the data-residency and routing observability that consumer designs omit, and build applications aligned with DPDP requirements across AWS, Microsoft, and Google. Read more about us, or contact our team to plan your confidential AI architecture.

References

Apple Security Research, Expanding Private Cloud Compute, June 8, 2026.

Nvidia Blog, NVIDIA Confidential Computing to Help Expand Apple's Private Cloud Compute, June 9, 2026.

CNBC, Apple partnering with Google and Nvidia for most advanced AI model, June 8, 2026.

CIO Dive, Apple teams up with Google, Nvidia to expand private cloud capabilities, 2026.

The New Stack, Apple's Private Cloud Compute vs. Confidential Computing, 2026.

SimpleMDM, Apple Intelligence: How secure is Private Cloud Compute for enterprise?, 2026.

Crypto Briefing, Nvidia expands Confidential Computing for Apple's Private Cloud Compute on Google Cloud at WWDC26, 2026.

Apple Machine Learning Research, Introducing the Third Generation of Apple's Foundation Models, 2026.

Nvidia, Blackwell architecture, 2026.

Gartner, Top Strategic Technology Trends for 2026, 2026.

Fortune Business Insights, Confidential Computing Market, Forecast to 2034, 2026.

DPDPA.com, DPDPA Penalties Explained: Rs 50 Crore to Rs 250 Crore Fines, 2026.

Spheron, GPU Cloud Pricing 2026, 2026.

_Last updated: June 22, 2026._

Frequently asked

Quick answers.

01 What is AFM 3 Cloud Pro?

AFM 3 Cloud Pro is the most capable model in Apple's third-generation Foundation Models, reserved for agentic tool use and complex reasoning. It runs on Nvidia Blackwell GPUs hosted in Google Cloud under Private Cloud Compute, at a quality Apple compares to Gemini frontier models, while Apple maintains the model is its own, refined using Gemini outputs.

02 Why does Apple run it on Nvidia GPUs in Google Cloud?

The model was too demanding to run on Apple silicon at the speed Apple wanted, and Nvidia capacity on Google Cloud was the path that shipped. Apple extended Private Cloud Compute to that infrastructure using Nvidia Confidential Computing, Intel TDX, and Google's Titan chip, configured so the GPUs cannot read Apple's servers.

03 Is AFM Cloud Pro suitable for enterprise use?

Its trust model is strong, but it is built for individual privacy, not enterprise governance. Apple hides node location for non-targetability, does not confirm what country data touches, and has not specified when requests offload. Enterprises with residency or audit requirements must add controls that Apple's consumer design intentionally omits.

04 How is Private Cloud Compute different from confidential computing?

Confidential computing protects data in use from the infrastructure operator with a trusted execution environment and attestation. Private Cloud Compute builds on that and adds statelessness, no privileged runtime access, non-targetability, and verifiable transparency across the whole stack. Adopting confidential computing gives you in-use protection; the rest is extra engineering.

05 What is the data-residency problem?

Apple deliberately does not disclose where Private Cloud Compute nodes are located, because non-targetability depends on that opacity. For a regulated enterprise that must prove which region processed personal data, undisclosed location is a compliance gap. If you borrow the architecture, add explicit region pinning so you can document residency.

06 Should an enterprise build or rent confidential inference?

Most regulated enterprises in 2026 rent confidential GPU capacity on a public cloud, the path Apple took, then add residency and observability controls. Running your own confidential hardware gives maximum control at the highest cost, while a hosted model API is cheapest but weakest on data control. The middle path balances capability and governance.

07 How much does confidential GPU inference cost?

As of June 2026, Nvidia GPU capacity ran from roughly $2.12 per GPU-hour on spot to about $14.24 on a fully bundled instance. Confidential mode adds little overhead on current hardware, so the main cost lever is volume. Keeping routine work on-device, as Apple does, is the most effective way to control the bill.

08 What should I copy from Apple's design?

Copy attestation before data leaves your boundary, stateless inference with no retention, an append-only record of what ran, and a swappable model layer. Then add the two things Apple omits for consumers: explicit data-residency control and per-request routing visibility, so you can prove to a regulator where inference happened.

About the author

Manu Shukla

Founder & Director

Founder of eCorpIT. Hands-on engineer leading senior-only delivery for AI apps, custom software, and cloud systems for global clients.

One engineering note a week. No fluff, no spam.

Senior-architect playbooks on AI agents, mobile apps, cloud, security, data, and marketing — delivered every Wednesday.

Past the reading

Read enough. Let's build something.

A senior architect responds in 24 working hours with scope, indicative cost, and a timeline. NDA before any technical conversation.

Talk to an architect Browse the 10 practices