On this page · 9 sections
Summary. Most enterprise AI agents never ship. MIT found in August 2025 that 95% of generative AI pilots drive no measurable profit-and-loss result, and Gartner expects more than 40% of agentic AI projects to be canceled by the end of 2027. Yet the winners are concrete and measurable: JPMorgan runs 450+ AI use cases across 200,000 employees, IBM's AskHR contains 94% of routine HR queries, and Duolingo cut median code review time from 3 hours to 1 with GitHub Copilot. McKinsey's November 2025 survey found 23% of organisations already scaling an agentic system and another 39% experimenting. If you are a founder or CTO scoping a first production agent, the pattern is clear: narrow, well-instrumented workflows ship; broad autonomy stalls. This piece breaks down 7 use cases that reached production and 3 that stalled.
The gap between the 95% that fail and the 5% that work is not model quality. It is scope. The agents in production solve one bounded job with a clean before-and-after metric. The ones that stall try to automate a whole department at once. Below, each use case names the deployment, the reported result, and the date, so you can judge what is real against your own roadmap.
The adoption baseline for 2026
Before the use cases, the honest numbers on where enterprises actually are. McKinsey's state of AI survey, fielded from 25 June to 29 July 2025 across 1,993 respondents in 105 nations, found 23% of organisations scaling an agentic AI system somewhere in the business and 39% experimenting, so 62% are at least trying. The catch: in any single business function, no more than 10% report scaling agents, and nearly two-thirds have not begun scaling AI enterprise-wide. Agents are real, but they are deployed in pockets, not everywhere.
Anushree Verma, senior director analyst at Gartner, put the hype in context plainly: "Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied." That is the backdrop for every deployment below.
What the ROI data says
Where the returns land, they are large. Compiled 2026 survey data reports an average 171% return on agentic deployments, rising to about 192% at United States enterprises, with 74% of executives reaching positive ROI inside the first year and cost reductions of 25% to 40% in targeted processes within the first 90 days. Adoption is concentrated rather than universal: roughly 52% of enterprises report at least one agent in production, with banking and insurance leading. Sector-specific agents outperform general-purpose ones by a wide margin, because a narrow agent inherits the rules and data of one domain. The number that matters for a first project, though, is not the industry average. It is whether you can measure your own before-and-after.
The 7 use cases that reached production
| Use case | Named deployment | Reported result (2024-2026) |
|---|---|---|
| Customer service deflection | Klarna AI assistant (OpenAI) | 2.3M chats/month; resolution 11 min to under 2 min; ~$40M annual benefit |
| Internal HR service desk | IBM AskHR (watsonx Orchestrate) | 94% query containment; 75% fewer tickets since 2016; 11.5M interactions in 2024 |
| Enterprise knowledge assistant | JPMorgan LLM Suite | 200,000 employees daily; 450+ use cases in production |
| Software engineering copilot | Duolingo with GitHub Copilot | 25% faster in new repos; code review 3 hours to 1 hour |
| Accounts-payable automation | Documented manufacturing case studies | 350+ complex invoices/month; reconciliation 3 hours to 2 minutes |
| Sales development (SDR) | Compiled enterprise SDR agents | Payback in 3.4 months; median agent time-to-value 5.1 months |
| Fraud and dispute triage | Financial-services workflow agents | High-volume tier-1 resolution with measurable before/after metrics |
1. Customer service deflection
The flagship case is Klarna. Its AI assistant, built with OpenAI, handled 2.3 million conversations in its first month, cut resolution time from 11 minutes to under 2, and was credited with about $40 million in annual benefit while doing the work of roughly 700 agents, per reporting on the rollout. It is the clearest proof that bounded, high-volume support is agent-ready. It is also, as the stalled section shows, a cautionary tale about going too far.
2. Internal HR service desk
IBM's AskHR, rebuilt on watsonx Orchestrate, reached a 94% containment rate on common questions, drove a 75% reduction in support tickets since 2016, and logged more than 11.5 million employee interactions in 2024, according to IBM's own case study. It also contributed to a 40% cut in HR operational cost over four years, with 99% adoption among managers. Internal helpdesks are a strong first agent because the data is owned and the failure cost is low.
3. Enterprise knowledge assistant
JPMorgan's LLM Suite reaches 200,000 employees daily and supports 450+ AI use cases in production, on the back of an $18 billion annual technology budget, per coverage of the platform. Investment bankers reportedly build five-page decks in about 30 seconds. The lesson from JPMorgan's 450-use-case programme is that scale comes from many small, governed applications, not one giant agent.
4. Software engineering copilot
Duolingo gave GitHub Copilot to its 300-plus engineers and reported a 25% increase in developer speed for those working in unfamiliar repositories and a 10% lift for experienced staff. A Slack integration for review notifications cut median code review turnaround from three hours to one, a 67% drop, per the GitHub customer story. Coding agents are the most reliable productivity win because the output is immediately testable.
5. Accounts-payable automation
In finance operations, documented enterprise case studies describe agents processing 350-plus complex invoices a month at a 90%-plus automation rate, with reconciliation time falling from three hours to two minutes. Invoice matching is a textbook agent task: structured inputs, deterministic rules, and an audit trail the agent can populate.
6. Sales development
Outbound sales development agents qualify leads, draft outreach and book meetings. Compiled deployment data puts SDR-agent payback at 3.4 months, against a median agent time-to-value of 5.1 months and 8.9 months for finance and operations agents, per the same case-study compilation. The short payback comes from a clear revenue metric: meetings booked.
7. Fraud and dispute triage
Banking and insurance lead production adoption, and fraud and dispute workflows are why. These are high-volume, well-defined tasks with measurable resolution rates, which is exactly the profile that survey data on high-ROI use cases identifies as agent-ready. The agent triages and routes; a human owns the high-stakes call.
Time to value, by function
A first deployment lives or dies on payback. The spread is wide, and choosing a fast-payback function for your first agent is the single highest-use decision.
| Function | Typical time to value | Note |
|---|---|---|
| Customer service | About 2 weeks | Fastest; bounded intents, owned data |
| Sales development | 3.4 months | Clear revenue metric: meetings booked |
| First agent (median) | 5.1 months | Across functions, per compiled data |
| Finance and operations | 8.9 months | More integration and controls needed |
| Supply-chain orchestration | 12+ months | Multi-system; slowest, highest risk |
Source: compiled enterprise agent deployment data. The takeaway for a first project is to start where payback is measured in weeks, prove the pattern, then move up the difficulty curve. The same sequencing logic appears in our notes on engineering lessons from shipping enterprise AI agents.
The 3 that stalled
The failures rhyme. Each tried to remove the human from a job where the cost of error was high, or it never left the pilot stage.
| Stall pattern | Evidence | The fix |
|---|---|---|
| Full replacement of staff | Klarna walked back its AI-only support and rehired humans | Hybrid: agent handles routine, human owns high-stakes cases |
| Broad autonomous agents | Gartner: 40%+ of agentic projects canceled by 2027 | Narrow scope; real tools, not "agent washing" |
| Pilots with no P&L link | MIT: 95% of GenAI pilots show no measurable result | Pick one workflow with a before/after metric |
Stall 1: replacing staff outright
Klarna is the cautionary half of its own story. By mid-2025 it began rehiring human agents after customer satisfaction slipped. CEO Sebastian Siemiatkowski told Bloomberg: "We went too far," adding, "We focused too much on cost. The result was lower quality." The company moved to a hybrid model, keeping the agent for routine chats while humans handle disputes, complex refunds and hardship cases. Its most recent public update put the agent at the work of about 853 employees and roughly $60 million in annual benefit under the hybrid setup. The agent stayed; the all-or-nothing framing did not.
Stall 2: broad autonomous agents
Gartner's prediction that 40% of agentic projects will be canceled by 2027 rests on a poll of more than 3,400 organisations. The named causes are escalating costs, unclear business value and inadequate risk controls. Gartner also flags "agent washing," estimating only about 130 of thousands of self-described agentic vendors are real. Agents that try to own a whole multi-step process across legacy systems are where budgets quietly die. Strong enterprise AI agent governance is what keeps these projects from drifting.
Stall 3: pilots that never reach P&L
MIT's research, reported by Fortune in August 2025, found 95% of generative AI pilots deliver no measurable P&L impact and only 5% reach real scale. The cause is a learning gap and weak enterprise integration, not the underlying models. A pilot with no owner and no before/after number is a science project, and it will be canceled in the next budget cycle.
How to pick your first agent
Three rules separate the 5% from the 95%. First, choose a workflow with a metric you already track, so the before-and-after is undeniable. Second, keep the human in the loop where the cost of error is high; Klarna's reversal is the $60-million lesson. Third, instrument everything from day one, because the agents that survive are the ones whose value shows up in a dashboard. For founders scoping a first deployment, the customer-service and internal-helpdesk patterns offer the fastest, safest path to a real result, the same conclusion we reach in our AI agent customer-experience use cases.
India-specific considerations
For Indian founders and CTOs, the agent calculus has two local twists. First, any customer-facing agent that reads or routes personal data now sits under the DPDP regime. The consent, purpose-limitation and breach-notice duties in India's DPDP consent manager framework apply to the agent as much as to any other system, so design consent capture and data minimisation into the workflow from the first sprint, not as a retrofit. Second, the labour-cost arbitrage behind Klarna's headline savings is smaller where support salaries are lower. That shifts the first-agent business case in India toward quality, speed and capacity gains rather than pure headcount reduction. Internal helpdesks, coding copilots and document automation tend to clear the bar fastest in the Indian cost structure, because their value lands as engineer hours and cycle time rather than rupees saved on staffing. The sequencing rule still holds: pick the function with a metric you already report, prove it, then expand.
FAQ
How eCorpIT can help
eCorpIT is a Gurugram-based technology organisation with senior-led engineering teams that help founders and CTOs ship their first production AI agent without joining the 95% that stall. We scope a bounded, metric-backed workflow, build the integration and human-in-the-loop handoffs, and instrument value from day one. Founded in 2021 and assessed at CMMI Level 5, we pair agent delivery with the governance that keeps it in production. To scope a first agent for your stack, contact our team.
References
_Last updated: 26 June 2026._