On this page · 13 sections
- What Apple actually ships in iOS 27
- The 1-line swap, explained
- Cost and capability: device vs cloud
- A routing layer worth building
- What changed at WWDC 2026
- Custom adapters still fit the pattern
- Testing without a network
- Common mistakes to avoid
- India-specific considerations
- A short build checklist
- FAQ
- How eCorpIT can help
- References
Summary. Apple's Foundation Models framework in iOS 27 runs a roughly 3-billion-parameter model on-device, and Apple's third-generation server model uses a 20-billion-parameter sparse architecture that activates 1 to 4 billion parameters per request, per Apple Machine Learning Research. The framework is in the Xcode 26.3 beta now, with iOS 27's public release expected in September 2026 and betas running through summer 2026. Apple does not ship a native Claude or Gemini provider. The "1-line swap" is an app-side Swift pattern: you put one protocol in front of the on-device model and your cloud calls, so changing providers means changing one initializer line. This guide shows the pattern, the trade-offs in cost and privacy, and what changed at WWDC 2026, which ran June 8–12, 2026.
Three numbers frame the decision. The on-device model is about 3 billion parameters and costs $0 per token. A frontier cloud model from Anthropic or Google can be far larger but bills per token and sends prompt data off-device. Apple introduced the framework at WWDC 2025 and expanded it at WWDC 2026 with an open-source utilities package, per Apple's WWDC26 session 241. Your job as an iOS developer is to use the free, private model by default and reach for a cloud model only when the task needs it, without rewriting call sites each time.
What Apple actually ships in iOS 27
The Foundation Models framework is a high-level Swift API for prompting language models, generating structured output, and building agentic flows, per Lushbinary's Swift guide. You obtain the system model through SystemLanguageModel, open a LanguageModelSession, and call respond or streamResponse, as documented at Create with Swift.
Two facts matter before you design anything. First, the model runs on-device, so prompts stay on the iPhone or iPad and there is no per-token charge. Second, the model is small by frontier standards. Apple's research puts the on-device model at approximately 3 billion parameters, while the third-generation server model is a 20-billion-parameter sparse design that activates 1 to 4 billion parameters at a time, per Apple Machine Learning Research. The on-device model is the one the framework hands to your app directly.
Apple ships a separate, lower-level path too. Core AI, introduced at WWDC 2026, runs arbitrary on-device models such as Qwen, per Lushbinary. That is a different tool from the high-level Foundation Models API and a different discussion. This guide stays on the Foundation Models framework, because that is where most app teams start.
What Apple does not ship
Apple does not ship a built-in Claude or Gemini backend. The framework runs the system model and custom LoRA adapters you train yourself, per the Apple Developer documentation on custom adapters. If you want a response from Anthropic's Claude or Google's Gemini, your app makes a network call to that vendor's API. Nothing in Apple's framework does that for you. So the "1-line swap" is not an Apple feature. It is a small piece of architecture you write, and the rest of this guide is about writing it well.
The 1-line swap, explained
The swap is a protocol. You define one Swift protocol that every model backend conforms to, then write a thin provider for each: the Apple on-device model, Claude, and Gemini. Because all three share the same method signatures, switching providers is one initializer line. Every call site that prompts a model stays untouched.
Here is the protocol. Keep it small.
protocol TextModelProvider {
func reply(to prompt: String) async throws -> String
}
The Apple on-device provider wraps a LanguageModelSession:
import FoundationModels
struct AppleOnDeviceProvider: TextModelProvider {
private let session = LanguageModelSession()
func reply(to prompt: String) async throws -> String {
let response = try await session.respond(to: prompt)
return response.content
}
}
A Claude provider hits Anthropic's HTTP API:
struct ClaudeProvider: TextModelProvider {
let apiKey: String
func reply(to prompt: String) async throws -> String {
// POST to Anthropic's Messages API, decode the text, return it.
// Send the API call from your backend, not the device, in production.
return try await AnthropicClient(apiKey: apiKey).complete(prompt)
}
}
A Gemini provider does the same against Google's API:
struct GeminiProvider: TextModelProvider {
let apiKey: String
func reply(to prompt: String) async throws -> String {
return try await GeminiClient(apiKey: apiKey).complete(prompt)
}
}
Now the swap is one line at the composition root:
let provider: TextModelProvider = AppleOnDeviceProvider()
// let provider: TextModelProvider = ClaudeProvider(apiKey: key)
// let provider: TextModelProvider = GeminiProvider(apiKey: key)
Every feature in the app calls provider.reply(to:). The summariser does not know or care which model answered. That is the whole trick: the protocol absorbs the difference, so the cost of changing your mind is one line, not a refactor.
The real cost is rarely the swap line. It is the per-provider quirks underneath: streaming, tool-calling shapes, token limits, error handling, and where the API key lives. Put those inside each provider. Keep the protocol blunt.
Cost and capability: device vs cloud
The on-device model is free and private but small. Cloud models cost money and send data off-device but reach further on hard tasks. Pick per feature, not per app.
| Factor | Apple on-device model | Claude (cloud) | Gemini (cloud) |
|---|---|---|---|
| Where it runs | On the iPhone/iPad | Anthropic servers | Google servers |
| Model size | ~3 billion parameters | Frontier-scale (vendor-set) | Frontier-scale (vendor-set) |
| Per-token cost | None | Per-token, vendor pricing | Per-token, vendor pricing |
| Prompt data leaves device | No | Yes | Yes |
| Network required | No | Yes | Yes |
| Best fit | Short, private, offline tasks | Long-context reasoning | Long-context, multimodal |
Pricing for Claude and Gemini changes often and is set by each vendor, so confirm current rates on Anthropic's and Google's official pricing pages before you commit a budget. The on-device figure is the durable one: zero marginal cost per call. For Indian teams, that zero matters. A feature that fires thousands of times a day at ₹0 on-device versus a metered cloud call is a real line item, and the on-device default keeps your unit economics predictable.
A routing layer worth building
A single default provider is fine to start. Most apps then add a router: try the on-device model first, escalate to a cloud provider when the task is too big or the device model's answer fails a check.
struct RoutingProvider: TextModelProvider {
let onDevice: TextModelProvider
let cloud: TextModelProvider
let shouldEscalate: (String) -> Bool
func reply(to prompt: String) async throws -> String {
if shouldEscalate(prompt) {
return try await cloud.reply(to: prompt)
}
return try await onDevice.reply(to: prompt)
}
}
RoutingProvider conforms to the same protocol, so it slots into the same one-line composition root. Escalation rules stay simple at first: prompt length over a threshold, a task type that needs long context, or a low-confidence on-device result. Keep the rule readable. A router nobody understands is a router nobody can debug at 2am.
There is a cheaper pattern than always escalating: try on-device, validate, and fall back only on failure. The on-device model is about 3 billion parameters, so it handles short summaries, classification, and structured extraction well, and you avoid a paid cloud call most of the time. Reserve the cloud path for the cases the device model genuinely cannot serve. Measure the escalation rate in production. If it climbs above a few percent for a feature, that feature probably wants a cloud default, and you change one line to give it one.
Streaming through the same protocol
Apple's framework exposes streamResponse alongside respond, per Create with Swift. If your UI streams tokens, add a streaming method to the protocol so every provider can drive the same UI:
protocol TextModelProvider {
func reply(to prompt: String) async throws -> String
func stream(_ prompt: String) -> AsyncThrowingStream<String, Error>
}
Claude and Gemini both stream over HTTP, and the on-device model streams locally. The provider hides that difference; your view model consumes one AsyncThrowingStream regardless of who is answering. Keep the stream type identical across providers, or the abstraction leaks and the one-line swap stops being one line.
What changed at WWDC 2026
Apple expanded the framework rather than replacing it. At WWDC 2026, which ran June 8–12, 2026 with the keynote on June 8, Apple introduced a new open-source utilities package for the Foundation Models framework, plus model updates, system tools, dynamic profiles, evaluations, and tooling, per Apple's WWDC26 session 241. Apple's newsroom framed it as new options to integrate AI into apps, building on the framework introduced the year before, per Apple Newsroom.
The system model itself stays closed. The open-source piece is the utilities package, which helps with prompting and structured output. Your provider layer can reuse those utilities on the Apple path while your cloud providers handle their own request shaping. Tool-calling control also tightened in iOS 27, including a clearer trust boundary around the on-device Tool protocol, per Blake Crosley's analysis.
Custom adapters still fit the pattern
You can specialise the on-device model with a custom LoRA adapter trained on your data using Apple's Python toolkit, per the Apple Developer documentation and a DEV Community walkthrough. Adapter parameters are small: Apple represents adapter values in 16 bits, and a rank-16 adapter for the ~3 billion parameter model typically needs tens of megabytes, per Apple Machine Learning Research.
The adapter is an implementation detail of your Apple provider. Load it inside AppleOnDeviceProvider; the protocol above does not change, and your cloud providers never see it. Train an adapter when the on-device model needs to become a subject-matter expert in your domain, not as a default first step.
Testing without a network
A protocol gives you one more thing for free: a fake provider for tests. Conform a stub to TextModelProvider, return canned text, and your feature tests run with no model, no network, and no API key. This matters because the on-device model needs a real device or supported simulator, and cloud calls cost money and need keys you should not put in CI.
struct StubProvider: TextModelProvider {
let canned: String
func reply(to prompt: String) async throws -> String { canned }
}
Inject StubProvider in unit tests, AppleOnDeviceProvider on device, and a cloud provider only in the narrow integration tests that need it. The same protocol that enables the one-line swap in production also keeps your test suite fast and offline. Apple's WWDC 2026 release added evaluation tooling for the framework, per session 241, which pairs well with this: evaluate the on-device model's output quality, and let your stub cover the plumbing.
Common mistakes to avoid
Three errors show up repeatedly when teams add this pattern. First, baking a cloud API key into the app binary; ship it to one user and you have shipped it to all of them, so keys belong on a backend you control. Second, letting each provider return a different shape, which forces if-checks at call sites and defeats the abstraction; normalise the return type inside each provider. Third, escalating to a cloud model by default when the on-device model would do, which spends money and sends data off-device for no gain. Default on-device, escalate by exception.
A fourth, quieter mistake: treating the framework as a drop-in replacement for a frontier model. The on-device model is about 3 billion parameters. It is genuinely useful for summaries, extraction, and short reasoning, but it will not match a 20-billion-parameter sparse server model on the hardest prompts, per Apple Machine Learning Research. Size your expectations to the model you are calling.
India-specific considerations
For teams building in India or shipping to Indian users, two things shape the design. First, the on-device default keeps cost in rupees predictable, because on-device inference has no per-token charge. Second, privacy law applies the moment a prompt leaves the device. The Digital Personal Data Protection Act 2023 (DPDP) governs processing of personal data, and a cloud call to Claude or Gemini is processing on a third-party server. Build the provider layer so cloud calls are explicit and easy to disable per feature, and disclose that processing. If a feature handles personal data, keep it on-device unless there is a clear, consented reason to escalate.
For more on planning enterprise AI features across providers, see our generative AI enterprise strategy guide for 2026. If you also care about how AI features surface in search and answer engines, our AEO vs GEO vs SEO guide covers the discovery side, and our ultimate SEO guide for 2026 covers the search foundations those features still rely on.
One more practical point for Indian and global teams alike: the backend that holds your Claude and Gemini keys also becomes the place you log, rate-limit, and bill cloud usage. Keep that service thin and auditable. When a regulator or an internal review asks what data left the device and where it went, a single proxy with clear logs answers the question in minutes. A device that calls vendors directly cannot.
A short build checklist
Start on-device, abstract early, escalate late.
| Step | Action | Why it matters |
|---|---|---|
| 1 | Define one TextModelProvider protocol |
One swap point for the whole app |
| 2 | Ship the Apple on-device provider first | Free, private, no network |
| 3 | Add Claude and Gemini providers behind it | Capability when the task needs it |
| 4 | Add a routing provider | Escalate only on hard prompts |
| 5 | Move API keys off-device | Keys on the device leak |
| 6 | Disclose cloud processing | Required under DPDP and store rules |
Keys belong on your backend, not in the app binary. A provider that calls Claude or Gemini directly from the device ships your API key to every user, so route cloud calls through a server you control.
FAQ
How eCorpIT can help
eCorpIT is a CMMI Level 5, MSME-certified, senior-led engineering organisation based in Gurugram. Our iOS teams design Foundation Models features that default to on-device inference, then add a provider protocol so Claude or Gemini slot in with a one-line change when a task needs more capability. We build applications aligned with DPDP Act 2023 requirements and keep cloud API keys off the device. To scope an iOS AI feature or an audit of your current provider design, contact us.
References
- Apple's Foundation Models Framework: Run AI On-Device With Just a Few Lines of Swift — DEV Community
_Last updated: 22 June 2026._