On this page · 14 sections
- 1. LiteLLM: per-key cost attribution at the gateway
- 2. Langfuse: cost and latency on every trace
- 3. Cloudflare AI Gateway: a free cost dashboard with no infrastructure
- 4. Arize Phoenix: open-source tracing across agent steps
- 5. PostHog LLM analytics: cost next to product data
- 6. Traceloop OpenLLMetry: vendor-neutral cost telemetry
- 7. Opik by Comet: cost intelligence per trace
- A note on Helicone
- How the seven compare
- India-specific considerations
- How to choose
- FAQ
- How eCorpIT can help
- References
Summary. As of June 2026, published LLM API prices run from about $0.14 per million input tokens for DeepSeek V4 Flash to $30 for GPT-5.4 Pro, a spread of more than 100x, with Claude Sonnet 4.6 at $3/$15 and Gemini 3.1 Pro at $2/$12 for input and output (CloudZero pricing comparison). Industry prices fell roughly 80% from 2025 to 2026, yet most engineering teams cannot say which feature, team, or customer spent that money. That is a measurement problem, and the first pillar of FinOps for AI is visibility: you cannot improve what you do not measure. This guide covers seven free tools that turn raw token usage into a per-call, per-team dollar figure: the LiteLLM gateway, Langfuse (acquired by ClickHouse on 16 January 2026), Cloudflare AI Gateway, Arize Phoenix, PostHog LLM analytics, Traceloop OpenLLMetry, and Opik by Comet. Every one has a free tier or a self-hostable open-source build, and the comparison and pricing below are dated to June 2026.
The reason this matters more in 2026 than in 2024 is agentic workloads. A single agent run can fan out into dozens of model calls, each with its own retries, tool calls, and long context windows. Cheaper per-token prices do not cap the bill when call volume grows faster than prices drop. Cost observability and performance observability are the same problem: you cannot explain a cost spike without knowing which prompts ran, which models were called, and how many retry loops fired. This article is written for engineering leads, platform teams, and founders running LLM workloads who want that visibility without buying an enterprise contract first.
A quick definition. An LLM cost tool reads the input and output token counts the provider returns for each call, multiplies them by that model's published per-token price, and attaches the result to a trace, a user, a team, or an API key. Some tools sit inline as a gateway or proxy; others ingest telemetry your application emits. The seven below split into those two camps, and most teams end up running one of each.
| Tool | What it measures | Free option |
|---|---|---|
| LiteLLM | Per-key, per-team, per-user spend at the gateway | Open source, self-hosted |
| Langfuse | Cost and latency per trace, prompt, and model | Free cloud tier and MIT self-host |
| Cloudflare AI Gateway | Tokens, cost, caching, and errors per request | Free core, no infrastructure |
| Arize Phoenix | Traces and token costs across agent steps | Open source, self-hosted |
| PostHog LLM analytics | Cost and usage events alongside product data | 100k events per month free |
| Traceloop OpenLLMetry | Token and latency telemetry via OpenTelemetry | Open source SDK, Apache 2.0 |
| Opik by Comet | Token usage and model cost per trace | Open source and free cloud tier |
1. LiteLLM: per-key cost attribution at the gateway
LiteLLM is an open-source proxy that sits between your application and 100-plus model providers behind one OpenAI-compatible API. Because every call passes through it, LiteLLM can track every token and every dollar and attribute spend per virtual key, per team, and per user. You create virtual keys with budgets and rate limits, attach tags to requests, and let each tag carry its own budget. For cost attribution depth, nothing else free comes close, because the gateway sees the whole flow. The open-source build is free to self-host and needs a database such as PostgreSQL; a paid Enterprise tier adds SSO and audit logs for teams that need them.
There is a serious caveat that engineering teams must weigh. LiteLLM had a hard security year in 2026. In March, attackers published backdoored versions of the litellm PyPI package (1.82.7 and 1.82.8) after stealing publishing credentials through a compromised CI step; the packages were live for about 40 minutes before PyPI quarantined them. Separately, CVE-2026-42208, a pre-auth SQL injection rated CVSS 9.3, was exploited within roughly 36 hours of disclosure and added to the CISA Known Exploited Vulnerabilities catalog on 8 May 2026. Later advisories covered a command-injection flaw and an authentication bypass affecting versions before 1.84.0. The lesson is not to avoid LiteLLM, which remains the strongest open-source cost gateway, but to treat the proxy as a security-sensitive component: run version 1.84.0 or later, keep it patched, put it behind authentication, and isolate it on the network.
2. Langfuse: cost and latency on every trace
Langfuse is the most widely adopted open-source LLM observability platform, with more than 21,000 GitHub stars as of February 2026. It records a trace for each LLM interaction and tracks cost and latency alongside prompt management, evaluation, and datasets. The free Hobby tier gives 50,000 units per month, 30-day retention, and two users. The core platform is MIT-licensed, so you can self-host the full tracing, evaluation, and cost-tracking stack with no event limit and no licence fee. Paid cloud plans run $29 per month for Core and $199 for Pro, with extra units billed at $8 per 100,000.
In January 2026, ClickHouse acquired Langfuse alongside a $400 million Series D round, at a reported $15 billion valuation. The acquisition formalised a dependency that already existed, since Langfuse had moved its data layer to ClickHouse for high-throughput ingestion. As Langfuse chief executive Marc Klingen put it, "We built Langfuse on ClickHouse because LLM observability and evaluation is fundamentally a data problem." For teams, the practical news is that the MIT licence and the free tier did not change. Langfuse is the default pick when you want self-hostable cost tracking tied to real traces.
3. Cloudflare AI Gateway: a free cost dashboard with no infrastructure
If running a database and a proxy yourself is more than you want, Cloudflare AI Gateway gives you a cost view with zero infrastructure to manage. You point your model calls through the gateway, and the dashboard shows requests, tokens, cost, caching, and errors, with per-request logs that include token usage, cost, and duration. The core features, including analytics, caching, rate limiting, and spend limits, are free. The main cost to watch is log streaming: Logpush to an external service adds $0.05 per million records beyond 10 million per month on paid plans. For a team that wants a cost dashboard live in an afternoon, this is the lowest-effort option on the list.
4. Arize Phoenix: open-source tracing across agent steps
Arize Phoenix is an open-source observability and evaluation tool that runs locally, in a notebook, in Docker, or in the cloud. It is fully free and self-hostable with no feature gates, under the Elastic License 2.0, and it instruments popular frameworks such as LangChain and LlamaIndex through OpenInference. For cost work, Phoenix shines on agentic traces: it shows the full tree of spans for an agent run, so you can see which step, tool call, or retry loop is burning tokens. Arize also offers a managed cloud, AX, with a free tier and a paid AX Pro plan. Phoenix is the pick when your costs hide inside multi-step agents rather than single calls.
5. PostHog LLM analytics: cost next to product data
PostHog folds LLM observability into the product-analytics platform many teams already run. Its AI observability features are free for 100,000 LLM events per month with 30-day retention. It captures the details of each generation, builds an aggregated metrics dashboard, and lets you query the data with SQL, so you can put LLM cost next to conversion, retention, and the rest of your product metrics. If you already use PostHog, you add cost visibility with no new contract and no separate tool to learn. The value here is correlation: tying a cost number to the product behaviour that caused it.
6. Traceloop OpenLLMetry: vendor-neutral cost telemetry
OpenLLMetry, maintained by Traceloop, is an open-source SDK under the Apache 2.0 licence that extends OpenTelemetry for LLM observability. It captures model name and version, prompt and completion token counts, latency, and errors, then exports that telemetry to a wide range of destinations, including Traceloop, Dynatrace, SigNoz, and a plain OpenTelemetry Collector. The advantage is portability: you instrument once with an open standard and avoid lock-in to any single vendor's backend. For platform teams that already run OpenTelemetry for the rest of their stack, OpenLLMetry keeps LLM cost data in the same pipeline as everything else.
7. Opik by Comet: cost intelligence per trace
Opik, from Comet, is an open-source platform under the Apache 2.0 licence for tracing, evaluating, and monitoring LLM and RAG applications. Its cost tracking is included free in both the open-source build and the free cloud tier, giving visibility into token usage and model cost per trace, with cost intelligence to track agent usage and spend across engineering teams. You can run it locally from the GitHub source or use the hosted version. Opik is a good fit when you want evaluation and cost in one open tool, so quality regressions and cost regressions show up on the same dashboard.
A note on Helicone
Helicone deserves a mention because it appears on most older lists, and its status changed in 2026. Mintlify acquired Helicone in March 2026, and the project moved into maintenance mode. The open-source proxy, under the Apache 2.0 licence, still routes traffic to 100-plus models and ships security fixes, bug fixes, and new model support, but active feature development has stopped and hosted customers are being migrated. By the acquisition, Helicone had served more than 16,000 organisations and processed over 14.2 trillion tokens. Self-hosting the open-source code remains supported, so Helicone is still usable, but new teams should weigh the frozen roadmap before adopting it.
How the seven compare
The split that matters is gateway versus telemetry. A gateway (LiteLLM, Cloudflare) sits inline and enforces budgets, so it can both measure and block. A telemetry tool (Langfuse, Phoenix, PostHog, OpenLLMetry, Opik) records what happened and explains it, but does not stop a runaway agent on its own. Most production setups pair one of each: a gateway for control, an observability tool for depth. The second axis is hosting. A managed option like Cloudflare AI Gateway gets you a cost view in an afternoon, while a self-hosted open-source build keeps prompt logs and spend data inside your own environment at the cost of running the infrastructure yourself.
| Tool | Free tier (June 2026) | Licence or hosting |
|---|---|---|
| LiteLLM | Full proxy free to self-host | MIT, self-host (use 1.84.0+) |
| Langfuse | 50k units/month cloud; unlimited self-host | MIT core, cloud or self-host |
| Cloudflare AI Gateway | Free core dashboard and logs | Managed, no self-host needed |
| Arize Phoenix | Fully free, no feature gates | Elastic License 2.0, self-host |
| PostHog | 100k LLM events/month | Open source, cloud or self-host |
| Traceloop OpenLLMetry | Free SDK, unlimited | Apache 2.0, self-host backend |
| Opik by Comet | Free cloud tier and self-host | Apache 2.0, cloud or self-host |
For context, the prices these tools attach to your tokens are moving targets. The table below shows published list prices as of June 2026; treat them as a snapshot, not a contract.
| Model | Input per 1M tokens | Output per 1M tokens |
|---|---|---|
| GPT-5.4 | $2.50 | $15.00 |
| GPT-5.4 Pro | $30.00 | $180.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Gemini 3.1 Pro | $2.00 | $12.00 |
| DeepSeek V4 Flash | $0.14 | $0.28 |
India-specific considerations
For engineering teams in India, two factors push toward the self-hosted, open-source options on this list. The first is data residency and privacy. Prompt and completion logs routinely contain personal data, and under the Digital Personal Data Protection Act 2023 and the DPDP Rules 2025, that data carries consent and handling duties. Self-hosting Langfuse, Phoenix, Opik, or an OpenLLMetry backend keeps those logs inside your own environment rather than a third-party cloud, which simplifies the compliance story. The second is cost control on the cost tool itself: per-seat SaaS observability at $29 to $199 per user per month adds up for a growing team, whereas the MIT and Apache builds cost only the infrastructure they run on. A practical pattern for an India-based product team is LiteLLM as a self-hosted gateway for budgets and attribution, paired with a self-hosted Langfuse or Phoenix for trace-level cost, so both spend data and prompt logs stay in country. For the wider picture on running AI in production, see our generative AI enterprise strategy for 2026.
How to choose
Start with the question you most need answered. If it is "which team or customer spent this," begin with a gateway: LiteLLM if you can run it safely, Cloudflare AI Gateway if you want zero infrastructure. If it is "why did this agent run cost so much," begin with trace-level observability: Langfuse or Arize Phoenix. If you already run PostHog or OpenTelemetry, extend what you have with PostHog LLM analytics or OpenLLMetry rather than adding a tool. None of these requires a paid plan to start, so the real cost is the integration time, not the licence. Pick one gateway and one observability tool, instrument a single high-volume workload first, and expand once the numbers line up with your provider invoice.
FAQ
How eCorpIT can help
eCorpIT is a CMMI Level 5, senior-led engineering organisation in Gurugram that builds and instruments production LLM systems. We set up the gateway-plus-observability pattern described above, wire per-team and per-feature cost attribution into your stack, and self-host the open-source tools so prompt logs and spend data stay inside your environment and aligned with DPDP requirements. If your LLM bill is growing faster than you can explain it, talk to our team or read more about how we work.
References
_Last updated: 25 June 2026._