iOS 27 Foundation Models: plug in Claude or Gemini with a 1-line swap (2026)

iOS 27 ships a ~3B on-device model; a Swift provider protocol swaps in Claude or Gemini in one line.

Read time: 14 min
Word count: 2.1K
Sections: 13
FAQs: 8

Founder & Director June 22, 2026

iOS 27 Foundation Models framework enables seamless AI provider swaps with minimal code changes.

On this page · 13 sections

What Apple actually ships in iOS 27
The 1-line swap, explained
Cost and capability: device vs cloud
A routing layer worth building
What changed at WWDC 2026
Custom adapters still fit the pattern
Testing without a network
Common mistakes to avoid
India-specific considerations
A short build checklist
FAQ
How eCorpIT can help
References

Summary. Apple's Foundation Models framework in iOS 27 runs a roughly 3-billion-parameter model on-device, and Apple's third-generation server model uses a 20-billion-parameter sparse architecture that activates 1 to 4 billion parameters per request, per Apple Machine Learning Research. The framework is in the Xcode 26.3 beta now, with iOS 27's public release expected in September 2026 and betas running through summer 2026. Apple does not ship a native Claude or Gemini provider. The "1-line swap" is an app-side Swift pattern: you put one protocol in front of the on-device model and your cloud calls, so changing providers means changing one initializer line. This guide shows the pattern, the trade-offs in cost and privacy, and what changed at WWDC 2026, which ran June 8–12, 2026.

Three numbers frame the decision. The on-device model is about 3 billion parameters and costs $0 per token. A frontier cloud model from Anthropic or Google can be far larger but bills per token and sends prompt data off-device. Apple introduced the framework at WWDC 2025 and expanded it at WWDC 2026 with an open-source utilities package, per Apple's WWDC26 session 241. Your job as an iOS developer is to use the free, private model by default and reach for a cloud model only when the task needs it, without rewriting call sites each time.

What Apple actually ships in iOS 27

The Foundation Models framework is a high-level Swift API for prompting language models, generating structured output, and building agentic flows, per Lushbinary's Swift guide. You obtain the system model through SystemLanguageModel, open a LanguageModelSession, and call respond or streamResponse, as documented at Create with Swift.

Two facts matter before you design anything. First, the model runs on-device, so prompts stay on the iPhone or iPad and there is no per-token charge. Second, the model is small by frontier standards. Apple's research puts the on-device model at approximately 3 billion parameters, while the third-generation server model is a 20-billion-parameter sparse design that activates 1 to 4 billion parameters at a time, per Apple Machine Learning Research. The on-device model is the one the framework hands to your app directly.

Apple ships a separate, lower-level path too. Core AI, introduced at WWDC 2026, runs arbitrary on-device models such as Qwen, per Lushbinary. That is a different tool from the high-level Foundation Models API and a different discussion. This guide stays on the Foundation Models framework, because that is where most app teams start.

What Apple does not ship

Apple does not ship a built-in Claude or Gemini backend. The framework runs the system model and custom LoRA adapters you train yourself, per the Apple Developer documentation on custom adapters. If you want a response from Anthropic's Claude or Google's Gemini, your app makes a network call to that vendor's API. Nothing in Apple's framework does that for you. So the "1-line swap" is not an Apple feature. It is a small piece of architecture you write, and the rest of this guide is about writing it well.

The 1-line swap, explained

The swap is a protocol. You define one Swift protocol that every model backend conforms to, then write a thin provider for each: the Apple on-device model, Claude, and Gemini. Because all three share the same method signatures, switching providers is one initializer line. Every call site that prompts a model stays untouched.

Here is the protocol. Keep it small.


            protocol TextModelProvider {
    func reply(to prompt: String) async throws -> String
}

The Apple on-device provider wraps a LanguageModelSession:


            import FoundationModels

struct AppleOnDeviceProvider: TextModelProvider {
    private let session = LanguageModelSession()

    func reply(to prompt: String) async throws -> String {
        let response = try await session.respond(to: prompt)
        return response.content
    }
}

A Claude provider hits Anthropic's HTTP API:


            struct ClaudeProvider: TextModelProvider {
    let apiKey: String

    func reply(to prompt: String) async throws -> String {
        // POST to Anthropic's Messages API, decode the text, return it.
        // Send the API call from your backend, not the device, in production.
        return try await AnthropicClient(apiKey: apiKey).complete(prompt)
    }
}

A Gemini provider does the same against Google's API:


            struct GeminiProvider: TextModelProvider {
    let apiKey: String

    func reply(to prompt: String) async throws -> String {
        return try await GeminiClient(apiKey: apiKey).complete(prompt)
    }
}

Now the swap is one line at the composition root:


            let provider: TextModelProvider = AppleOnDeviceProvider()
// let provider: TextModelProvider = ClaudeProvider(apiKey: key)
// let provider: TextModelProvider = GeminiProvider(apiKey: key)

Every feature in the app calls provider.reply(to:). The summariser does not know or care which model answered. That is the whole trick: the protocol absorbs the difference, so the cost of changing your mind is one line, not a refactor.

The real cost is rarely the swap line. It is the per-provider quirks underneath: streaming, tool-calling shapes, token limits, error handling, and where the API key lives. Put those inside each provider. Keep the protocol blunt.

Cost and capability: device vs cloud

The on-device model is free and private but small. Cloud models cost money and send data off-device but reach further on hard tasks. Pick per feature, not per app.

Factor	Apple on-device model	Claude (cloud)	Gemini (cloud)
Where it runs	On the iPhone/iPad	Anthropic servers	Google servers
Model size	~3 billion parameters	Frontier-scale (vendor-set)	Frontier-scale (vendor-set)
Per-token cost	None	Per-token, vendor pricing	Per-token, vendor pricing
Prompt data leaves device	No	Yes	Yes
Network required	No	Yes	Yes
Best fit	Short, private, offline tasks	Long-context reasoning	Long-context, multimodal

Pricing for Claude and Gemini changes often and is set by each vendor, so confirm current rates on Anthropic's and Google's official pricing pages before you commit a budget. The on-device figure is the durable one: zero marginal cost per call. For Indian teams, that zero matters. A feature that fires thousands of times a day at ₹0 on-device versus a metered cloud call is a real line item, and the on-device default keeps your unit economics predictable.

A routing layer worth building

A single default provider is fine to start. Most apps then add a router: try the on-device model first, escalate to a cloud provider when the task is too big or the device model's answer fails a check.


            struct RoutingProvider: TextModelProvider {
    let onDevice: TextModelProvider
    let cloud: TextModelProvider
    let shouldEscalate: (String) -> Bool

    func reply(to prompt: String) async throws -> String {
        if shouldEscalate(prompt) {
            return try await cloud.reply(to: prompt)
        }
        return try await onDevice.reply(to: prompt)
    }
}

RoutingProvider conforms to the same protocol, so it slots into the same one-line composition root. Escalation rules stay simple at first: prompt length over a threshold, a task type that needs long context, or a low-confidence on-device result. Keep the rule readable. A router nobody understands is a router nobody can debug at 2am.

There is a cheaper pattern than always escalating: try on-device, validate, and fall back only on failure. The on-device model is about 3 billion parameters, so it handles short summaries, classification, and structured extraction well, and you avoid a paid cloud call most of the time. Reserve the cloud path for the cases the device model genuinely cannot serve. Measure the escalation rate in production. If it climbs above a few percent for a feature, that feature probably wants a cloud default, and you change one line to give it one.

Streaming through the same protocol

Apple's framework exposes streamResponse alongside respond, per Create with Swift. If your UI streams tokens, add a streaming method to the protocol so every provider can drive the same UI:


            protocol TextModelProvider {
    func reply(to prompt: String) async throws -> String
    func stream(_ prompt: String) -> AsyncThrowingStream<String, Error>
}

Claude and Gemini both stream over HTTP, and the on-device model streams locally. The provider hides that difference; your view model consumes one AsyncThrowingStream regardless of who is answering. Keep the stream type identical across providers, or the abstraction leaks and the one-line swap stops being one line.

What changed at WWDC 2026

Apple expanded the framework rather than replacing it. At WWDC 2026, which ran June 8–12, 2026 with the keynote on June 8, Apple introduced a new open-source utilities package for the Foundation Models framework, plus model updates, system tools, dynamic profiles, evaluations, and tooling, per Apple's WWDC26 session 241. Apple's newsroom framed it as new options to integrate AI into apps, building on the framework introduced the year before, per Apple Newsroom.

The system model itself stays closed. The open-source piece is the utilities package, which helps with prompting and structured output. Your provider layer can reuse those utilities on the Apple path while your cloud providers handle their own request shaping. Tool-calling control also tightened in iOS 27, including a clearer trust boundary around the on-device Tool protocol, per Blake Crosley's analysis.

Custom adapters still fit the pattern

You can specialise the on-device model with a custom LoRA adapter trained on your data using Apple's Python toolkit, per the Apple Developer documentation and a DEV Community walkthrough. Adapter parameters are small: Apple represents adapter values in 16 bits, and a rank-16 adapter for the ~3 billion parameter model typically needs tens of megabytes, per Apple Machine Learning Research.

The adapter is an implementation detail of your Apple provider. Load it inside AppleOnDeviceProvider; the protocol above does not change, and your cloud providers never see it. Train an adapter when the on-device model needs to become a subject-matter expert in your domain, not as a default first step.

Testing without a network

A protocol gives you one more thing for free: a fake provider for tests. Conform a stub to TextModelProvider, return canned text, and your feature tests run with no model, no network, and no API key. This matters because the on-device model needs a real device or supported simulator, and cloud calls cost money and need keys you should not put in CI.


            struct StubProvider: TextModelProvider {
    let canned: String
    func reply(to prompt: String) async throws -> String { canned }
}

Inject StubProvider in unit tests, AppleOnDeviceProvider on device, and a cloud provider only in the narrow integration tests that need it. The same protocol that enables the one-line swap in production also keeps your test suite fast and offline. Apple's WWDC 2026 release added evaluation tooling for the framework, per session 241, which pairs well with this: evaluate the on-device model's output quality, and let your stub cover the plumbing.

Common mistakes to avoid

Three errors show up repeatedly when teams add this pattern. First, baking a cloud API key into the app binary; ship it to one user and you have shipped it to all of them, so keys belong on a backend you control. Second, letting each provider return a different shape, which forces if-checks at call sites and defeats the abstraction; normalise the return type inside each provider. Third, escalating to a cloud model by default when the on-device model would do, which spends money and sends data off-device for no gain. Default on-device, escalate by exception.

A fourth, quieter mistake: treating the framework as a drop-in replacement for a frontier model. The on-device model is about 3 billion parameters. It is genuinely useful for summaries, extraction, and short reasoning, but it will not match a 20-billion-parameter sparse server model on the hardest prompts, per Apple Machine Learning Research. Size your expectations to the model you are calling.

India-specific considerations

For teams building in India or shipping to Indian users, two things shape the design. First, the on-device default keeps cost in rupees predictable, because on-device inference has no per-token charge. Second, privacy law applies the moment a prompt leaves the device. The Digital Personal Data Protection Act 2023 (DPDP) governs processing of personal data, and a cloud call to Claude or Gemini is processing on a third-party server. Build the provider layer so cloud calls are explicit and easy to disable per feature, and disclose that processing. If a feature handles personal data, keep it on-device unless there is a clear, consented reason to escalate.

For more on planning enterprise AI features across providers, see our generative AI enterprise strategy guide for 2026. If you also care about how AI features surface in search and answer engines, our AEO vs GEO vs SEO guide covers the discovery side, and our ultimate SEO guide for 2026 covers the search foundations those features still rely on.

One more practical point for Indian and global teams alike: the backend that holds your Claude and Gemini keys also becomes the place you log, rate-limit, and bill cloud usage. Keep that service thin and auditable. When a regulator or an internal review asks what data left the device and where it went, a single proxy with clear logs answers the question in minutes. A device that calls vendors directly cannot.

A short build checklist

Start on-device, abstract early, escalate late.

Step	Action	Why it matters
1	Define one `TextModelProvider` protocol	One swap point for the whole app
2	Ship the Apple on-device provider first	Free, private, no network
3	Add Claude and Gemini providers behind it	Capability when the task needs it
4	Add a routing provider	Escalate only on hard prompts
5	Move API keys off-device	Keys on the device leak
6	Disclose cloud processing	Required under DPDP and store rules

Keys belong on your backend, not in the app binary. A provider that calls Claude or Gemini directly from the device ships your API key to every user, so route cloud calls through a server you control.

FAQ

How eCorpIT can help

eCorpIT is a CMMI Level 5, MSME-certified, senior-led engineering organisation based in Gurugram. Our iOS teams design Foundation Models features that default to on-device inference, then add a provider protocol so Claude or Gemini slot in with a one-line change when a task needs more capability. We build applications aligned with DPDP Act 2023 requirements and keep cloud API keys off the device. To scope an iOS AI feature or an audit of your current provider design, contact us.

References

Introducing the Third Generation of Apple's Foundation Models — Apple Machine Learning Research

Introducing Apple's On-Device and Server Foundation Models — Apple Machine Learning Research

What's new in the Foundation Models framework — WWDC26 session 241, Apple Developer

Apple aids app development with new intelligence frameworks and advanced tools — Apple Newsroom

Loading and using a custom adapter with Foundation Models — Apple Developer Documentation

Exploring the Foundation Models framework — Create with Swift

Apple Foundation Models Framework: Swift Guide — Lushbinary

Apple's Foundation Models Framework: Run AI On-Device With Just a Few Lines of Swift — DEV Community

Apple Foundation Models in iOS 27: The Complete Builder Guide — ChatForest

Foundation Models in iOS 27: Tool-Calling Control — Blake Crosley

What to Build Before WWDC 2026: A Developer's Guide to iOS 27 — Stork.AI

_Last updated: 22 June 2026._

Frequently asked

Quick answers.

01 Does Apple let me run Claude or Gemini natively in iOS 27?

No. Apple's Foundation Models framework runs the on-device system model and custom LoRA adapters, not Claude or Gemini. To call those, you write your own provider behind a Swift protocol that hits each vendor's API over the network. The one-line swap happens in your app code, not inside Apple's framework.

02 How big is the on-device model in iOS 27?

Apple's on-device model is roughly 3 billion parameters and runs on Apple silicon. The third-generation server model uses a 20-billion-parameter sparse architecture that activates 1 to 4 billion parameters per request. The on-device model is the one the Foundation Models framework exposes directly to your Swift app.

03 What is the 1-line swap actually changing?

It changes which concrete type your code instantiates behind a shared Swift protocol. One initializer line picks the Apple on-device provider, a Claude provider, or a Gemini provider. Every call site that prompts the model stays identical because all providers conform to the same protocol with the same method signatures.

04 When does iOS 27 ship and when can I test this?

The Foundation Models framework is available in the Xcode 26.3 beta now, with developer and public betas running through summer 2026. Apple's public iOS 27 release is expected in September 2026. You can build and test the provider pattern against the on-device model during the beta period.

05 Why route to a cloud model at all if the device model is free?

The on-device model is private and has no per-token cost, but it is roughly 3 billion parameters. For long-context reasoning, large documents, or harder tasks, a cloud model such as Claude or Gemini can be more capable. A provider protocol lets you keep the device model as default and escalate only when needed.

06 Does this approach affect App Store privacy disclosures?

Yes. On-device inference keeps prompts on the device, but any cloud provider sends prompt data to that vendor's servers, which you must disclose. For Indian users, the DPDP Act 2023 applies to that processing. Design your provider layer so cloud calls are explicit, logged, and easy to disable per feature.

07 Can I still use custom adapters with this pattern?

Yes. Apple's framework supports custom LoRA adapters trained with its Python toolkit to specialize the on-device system model. Your Apple provider can load an adapter while cloud providers handle other requests. The protocol abstraction sits above both, so adapter use stays an implementation detail of the on-device path.

08 Is the Foundation Models framework open source in 2026?

At WWDC 2026, Apple introduced a new open-source utilities package for the Foundation Models framework alongside model updates, system tools, dynamic profiles, and evaluation tooling. The system model itself is not open source. The utilities package helps with prompting and structured output, which your provider layer can reuse across vendors.

About the author

Manu Shukla

Founder & Director

Founder of eCorpIT. Hands-on engineer leading senior-only delivery for AI apps, custom software, and cloud systems for global clients.

One engineering note a week. No fluff, no spam.

Senior-architect playbooks on AI agents, mobile apps, cloud, security, data, and marketing — delivered every Wednesday.

Past the reading

Read enough. Let's build something.

A senior architect responds in 24 working hours with scope, indicative cost, and a timeline. NDA before any technical conversation.

Talk to an architect Browse the 10 practices