On this page · 13 sections
- Why LLM cost tracking became a 2026 priority
- 1. LiteLLM: one gateway with spend tracking and budgets
- 2. Langfuse: tracing with token and cost breakdowns
- 3. tokencost: estimate a prompt's cost before you send it
- 4. Helicone: one-line logging with caching to cut cost
- 5. PostHog: connect LLM cost to product analytics
- 6. OpenCost: GPU and infrastructure cost for self-hosted models
- How to combine them into a cost stack
- India-specific considerations
- What this means for engineering teams
- FAQ
- How eCorpIT can help
- References
Summary. LLM spend stopped being a rounding error in 2026. The FinOps Foundation's sixth annual State of FinOps survey, published on February 19, 2026 across 1,192 respondents, found 98% now manage AI spend, up from 31% two years earlier, and named FinOps for AI the top forward-looking priority. This guide covers six free tools that help engineering teams measure and control that spend: LiteLLM, Langfuse, tokencost, Helicone, PostHog, and OpenCost. Five are open-source and self-hostable; all have a usable free tier. LiteLLM tracks spend and enforces budgets across 100+ providers. Langfuse, MIT-licensed and now owned by ClickHouse, gives 50,000 free observations a month. tokencost estimates the dollar cost of a prompt for 400+ models before you send it. Helicone's free tier covers 10,000 requests a month, with paid plans from $79. "As companies pursue transformation via AI, with the resulting increases in AI costs, FinOps practices will be critical," said J.R. Storment, Executive Director of the FinOps Foundation. The pattern across all six is the same: make cost visible early, then route, cache, and cap it.
This is written for ML engineers and platform teams who own an LLM bill and need to explain it. Each tool below lists what it does, its free tier, and where it fits.
Why LLM cost tracking became a 2026 priority
The State of FinOps 2026 data, published by the FinOps Foundation, shows how fast this shifted. Managing AI spend went from a minority practice to near-universal, at 98% of 1,192 teams, in two years. Alongside it, 90% of practitioners now manage SaaS, 64% manage licensing, and 78% of FinOps teams report to a CTO or CIO, a sign the discipline now sits close to engineering leadership.
One finding matters most for engineers: the survey names "shift left" as a top priority, meaning teams want cost context embedded earlier in the build, before the bill arrives rather than after. Pre-deployment architecture guidance was a top requested tooling capability. That is exactly what the tools below provide, from estimating a prompt's cost in code to capping a virtual key's monthly budget. Storment framed the stakes plainly: AI cost growth is now a board-level question, not a cleanup task. For the wider context on running AI economically, our AI delivery lessons for 2026 cover model routing as an architecture decision.
| Tool | Free tier | Best for |
|---|---|---|
| LiteLLM | Open-source, free to self-host | One gateway with budgets across 100+ providers |
| Langfuse | 50,000 observations/month free | Tracing with token and cost breakdowns |
| tokencost | Free, open-source library | Estimating prompt cost before you send it |
| Helicone | 10,000 requests/month free | One-line logging with caching to cut cost |
| PostHog | 100,000 LLM events/month free | Linking LLM cost to product analytics |
| OpenCost | Free, CNCF open-source | GPU and infra cost for self-hosted models |
1. LiteLLM: one gateway with spend tracking and budgets
LiteLLM is an open-source AI gateway that gives you a single OpenAI-format interface to more than 100 model providers, including OpenAI, Anthropic, Gemini, Bedrock, and Azure. For cost work, its value is built-in spend tracking: it maps each model's token pricing and exposes cost at the key, user, and team level, as the LiteLLM docs describe. You can set per-key and per-team budgets and rate limits, so a runaway script hits a cap instead of a five-figure invoice.
The open-source proxy is free to self-host, with paid enterprise tiers starting around $250 a month for teams that need support and SSO. It needs a database, typically PostgreSQL, for virtual keys and budget state. For most platform teams, LiteLLM is the natural first install, because it puts measurement and enforcement in the same place every request already passes through. The project is on GitHub under active development.
2. Langfuse: tracing with token and cost breakdowns
Langfuse is an open-source LLM engineering platform, MIT-licensed, that handles tracing, prompt management, evaluation, and cost tracking in one place. Its cost feature tracks usage and spend on each generation and breaks it down by model and usage type, separating token-based API cost from its own billable units so numbers do not double-count, as the Langfuse docs explain.
You can self-host the MIT core for free, or use Langfuse Cloud's free tier of 50,000 observations a month with no credit card. Two 2026 changes are worth knowing: ClickHouse acquired Langfuse in January 2026, and the SDKs were rewritten for v4 in March 2026, so new integrations should target v4. Langfuse integrates with OpenTelemetry, LangChain, the OpenAI SDK, and LiteLLM, which means it slots in behind a gateway cleanly. The code is on GitHub.
3. tokencost: estimate a prompt's cost before you send it
tokencost is the smallest tool here and the purest "shift left" fit. It is an open-source Python library from AgentOps that estimates the dollar cost of a prompt and completion for more than 400 models, counting tokens with Tiktoken and applying current per-model pricing, per its GitHub project. One function call returns the estimated USD cost of a string or a chat message list, before or after the request.
The point is to put a price in front of a decision. A retrieval step that stuffs 30,000 tokens into context has a cost you can compute in development, not discover in a monthly report. For teams building agents that loop, estimating per-step cost up front is the cheapest way to catch an expensive design before it ships. tokencost is free under an open-source license, and its pricing table is updated as providers change rates.
4. Helicone: one-line logging with caching to cut cost
Helicone is an open-source LLM gateway and observability platform, with 5,800+ GitHub stars, that you can adopt by changing a base URL or adding one header. Once requests flow through it, you get logging, cost tracking across 100+ models, and gateway features that reduce spend directly: response caching for repeated queries, plus rate limiting and provider failover. Its pricing gives a free Hobby tier of 10,000 requests a month, with Pro at $79 a month and Team at $799.
One honest caveat for 2026: after Mintlify acquired Helicone in March 2026, the platform moved to maintenance mode, with security updates, new model support, and bug fixes continuing rather than major new features, according to 2026 reviews from Braintrust. It remains a fast way to add cost visibility and caching, but weigh that status if you need a tool under active feature development. The source is on GitHub.
5. PostHog: connect LLM cost to product analytics
PostHog is an all-in-one developer platform that pairs LLM observability with product analytics, session replay, and feature flags. For cost teams, that pairing is the selling point: you can see not just what a feature costs in tokens but whether users engage with it, which is the number that decides if the spend is worth it. PostHog integrates with the major LLM providers and tracks token usage, latency, and cost per request, as the PostHog guide to open-source observability tools sets out.
The free tier includes 100,000 LLM observability events a month with 30-day retention, after which pricing is usage-based. If your team already runs PostHog for product analytics, adding LLM cost data avoids standing up a separate tool, and it puts engineering and product cost conversations on one timeline.
6. OpenCost: GPU and infrastructure cost for self-hosted models
The five tools above track API spend. If you run your own models on Kubernetes, the bill is infrastructure, and OpenCost covers that gap. OpenCost is a Cloud Native Computing Foundation project that allocates in-cluster cost for CPU, GPU, memory, load balancers, and storage, and pulls in cloud-provider charges for managed services, per the OpenCost project. It answers the question the API tools cannot: what does a self-hosted inference workload actually cost per team or per namespace?
GPU allocation is the part that matters for LLM teams, since a single idle GPU node is a quiet, recurring cost. OpenCost is free and open-source, and it complements rather than replaces the API trackers. A team running both hosted APIs and self-hosted models will want LiteLLM or Langfuse for the API side and OpenCost for the cluster side. The full picture needs both.
How to combine them into a cost stack
These six tools are layers, not competitors. The fastest way to read them is by the question each answers, and most teams end up running two or three together.
| Cost layer | Tool to use | Question it answers |
|---|---|---|
| Estimate before calling | tokencost | What will this prompt cost? |
| Gateway and budgets | LiteLLM, Helicone | Who is spending, and what is the cap? |
| Tracing and breakdowns | Langfuse | Where is the cost going, by model and step? |
| Cost versus product value | PostHog | Is the spend tied to usage that matters? |
| Self-hosted infrastructure | OpenCost | What do our own GPUs cost per team? |
A practical starting stack is LiteLLM as the gateway, Langfuse behind it for tracing and cost breakdowns, and tokencost in the codebase for pre-flight estimates. Add PostHog if product correlation matters, and OpenCost once you self-host. None of these requires a purchase to begin, which is the point: cost visibility should not itself be a budget line.
India-specific considerations
For teams in India, the strongest reason to prefer these tools is that five of the six can be self-hosted, which keeps prompts, completions, and spend logs inside your own infrastructure. That matters under the Digital Personal Data Protection Act, 2023, where LLM logs often contain personal data and the Data Fiduciary stays responsible for it. Penalties under the Act reach ₹250 crore per breach, so where LLM traffic logs live is a compliance decision, not only a cost one.
Self-hosting LiteLLM, Langfuse, or OpenCost on Indian infrastructure lets a team measure spend without shipping prompt content to a third-party SaaS in another region. For organisations weighing data residency against convenience, the open-source, self-hostable option removes the trade-off. The same engineering discipline that controls cost, owning the gateway and the logs, also supports DPDP-aligned data handling.
What this means for engineering teams
The 2026 signal from FinOps is that AI cost is now an engineering responsibility, measured in the build rather than reconciled afterward. The good news is that the tooling to do it is free. A team can estimate prompt cost in development with tokencost, route and cap spend through LiteLLM, trace it in Langfuse, tie it to product value in PostHog, and account for self-hosted GPUs with OpenCost, without a single license purchase. The work is integration and discipline, not budget. As Storment put it, FinOps practices are becoming critical to multi-year technology decisions, and the cheapest time to build that visibility is now.
FAQ
How eCorpIT can help
eCorpIT is a senior-led, CMMI Level 5 technology organisation in Gurugram that builds and operates AI systems for global and Indian businesses. We help engineering teams stand up an LLM cost stack, from a LiteLLM gateway with budgets to Langfuse tracing and OpenCost for self-hosted GPUs, and design it to keep logs and spend data within DPDP-aligned boundaries. We work across the AWS, Microsoft, and Google platforms our clients use. To plan a cost-control setup for your LLM workloads, contact our team.
References
- FinOps Foundation, "State of FinOps Survey: AI Value and Skills Top Priorities," February 19, 2026.
- FinOps Foundation, "State of FinOps 2026 Report data," 2026.
- PostHog, "7 best free and open source LLM observability tools," 2026.
- Langfuse, "Token & Cost Tracking," 2026.
- Langfuse, "Open source AI engineering platform (GitHub)," 2026.
- LiteLLM, "Spend Tracking documentation," 2026.
- BerriAI, "LiteLLM Proxy Server (GitHub)," 2026.
- Helicone, "Helicone pricing," 2026.
- Helicone, "Open source LLM observability platform (GitHub)," 2026.
- AgentOps, "tokencost: token price estimates for 400+ LLMs (GitHub)," 2026.
- OpenCost, "Open source cost monitoring for cloud native environments," 2026.
- Braintrust, "Best LLM gateways for developers in 2026," 2026.
- EY India, "Decoding the Digital Personal Data Protection Act, 2023," 2026.
_Last updated: June 24, 2026._