Tag: AI startups

  • I used over 16,000,000,000 tokens!

    I used over 16,000,000,000 tokens!

    The Big Lie Behind the Current Token Economy

    I analyzed my codex logs and found out I used more than 10 billion tokens over four months of work (above 16 billion over the last 8 months)1.

    I had the urge to put my financial hat and figure out the LLM provider token costs compared with APIs costs, after all I majored in finance and economics and I was very curious to run financial analysis, especially as I wanted to understand the numbers behind all this tokens economy, so I can build a stable and profitable pricing model for GentArk.

    I had some time between multiple long codex runs to think and work on other side projects. So, I made a comparison of costs between Codex token and OpenAI APIs token costs.

    I was not surprised by the finding, but I was still shocked by its severity. This article is about what I’ve analyzed, what I found, my thoughts about the token economy and the direction it should go. As right now, the way that it is, it cannot scale financially.

    The thesis

    The AI economy is being sold as if any software company can build on top of large language model APIs and compete fairly with the model providers’ own applications.

    That is the big lie.

    Using codex (don’t ask me why I don’t use Claude, I let others do the winner analysis), I asked it build me a dashboard from its own logs with all possible interesting data points and include token usage as reported in the logs. It did a respectable job fast (see the article image and reach out if you want the code).

    While I use codex to help me develop GentArk, I also use OpenAI APIs inside GentArk, as one of the models it supports. A quick analysis revealed I used above 240,000,000 APIs tokes since August 2025. A short calculation later I got the simple averages (ignoring specific models’ price, it’s negligible) – total cost divided by total tokens and got the following:

    Cost basisAverage Cost per tokenCost per 1M tokensCost per 1B tokens
    API token cost$0.0000031$3.10$3,100
    Codex token cost$0.000000035$0.035$35

    That is not a normal markup. It is an 87.57 x pricing gap.

    Put differently:

    (0.00000310.000000035)/0.000000035x100=8,757.14(0.0000031-0.000000035)/0.000000035 x 100 = 8,757.14

    The API token is 87.57 times more expensive, or 8,757% higher, than the app-side token price.
    Remember, I used average cost across multiple models, this decrease the cost gap. However, if we use the latest gpt-5.5 costs, the pricing gap is by far higher. Gpt-5.5 runs at $30 per 1M for output tokens (as I write these lines). Which means a higher gap in pricing:

    (0.000030.000000035)/0.000000035x100=85,614(0.00003-0.000000035)/0.000000035 x 100 = 85,614 %

    In this case API token is 856.14 times more expensive, or 85,614% higher, than the app-side token price.

    This means an independent AI application using the API must either:

    1. charge far more than the provider’s own app,
    2. use dramatically fewer tokens,
    3. accept much lower margins,
    4. degrade quality,
    5. or fail.

    At this scale, the API is not just infrastructure. It becomes a toll road owned by the same companies competing with the apps that must use the toll road.

    The rest of the market is being told to build skyscrapers on land it rents from its biggest future competitor.


    The token gap is the business model.

    The issue is not that APIs should be free. APIs have real costs: GPUs, inference serving, storage, safety systems, uptime, abuse prevention, billing, support, and enterprise security.

    The issue is the magnitude of the spread.

    A reasonable API premium over internal cost might be expected. A 2x, 5x, or even 10x difference can be explained by reliability, support, enterprise controls, and profit margin. A 20x gap may still be defensible for premium models, guaranteed capacity, compliance, and commercial indemnity.

    But 87.57 x to 856.14 x is not a support premium. It is a structural moat.

    At that level, the downstream app builder is not simply paying for compute. It is paying a strategic disadvantage tax.

    The model provider’s own app can offer broad usage, polished UX, native integration, direct billing, and brand trust while paying or internally accounting for tokens at a far lower effective rate. The independent app, by contrast, pays retail API prices before it has even paid for its own product development, hosting, customer support, compliance, sales, billing, and margin.

    That is why many AI SaaS companies look attractive in demo mode but collapse under production usage.

    The demo is cheap. The customer is expensive. The scale is brutal.


    The math destroys the “build on our API” story.

    Assume an AI workflow uses 10 million tokens per month for one active customer.

    At the API price:

    10,000,000x0.00003=30010,000,000 x 0.00003 = 300

    That is $300 per customer per month just for tokens.

    At the vendor-owned app effective token price:

    10,000,000x0.000000035=0.3510,000,000 x 0.000000035 = 0.35

    That is $0.35 per customer per month.

    So, one side can deliver the same token volume for thirty-five cents. The API-based app pays three hundred dollars.

    That is not a competitive market. That is a margin trap.

    To compete on cost, the API-based app would need to reduce token usage by:

    1(1/87)=99.851 – (1/87) = 99.85%

    In plain English: the API app must deliver at least the same value while using almost zero tokens compared with the provider’s own app.

    Based on my 4 months of codex usage I consumed 2.5 billion tokens a month (simple average), excluding APIs and ChatGPT. this translates to approximately $7,750 a month with APIs, compare that with $100 a month for codex Pro account.

    That is not optimization. That is economic impossibility for many agentic workflows.


    Why is this especially dangerous for AI agents?

    The problem gets worse with AI agents because agents are token-hungry by design.

    A normal chatbot interaction may use a prompt, a response, and maybe a short memory window. An agentic workflow can use:

    • planning tokens,
    • tool-call tokens,
    • retrieved-document tokens,
    • codebase context tokens,
    • intermediate reasoning tokens,
    • test output tokens,
    • repair-loop tokens,
    • validation tokens,
    • retry tokens,
    • logging and audit tokens.

    A software-building agent, research agent, legal-review agent, finance-analysis agent, or customer-support agent can consume many times more tokens than a simple chat session.

    This is why pricing pressure is already showing up in enterprise AI coding tools. GitHub announced that Copilot is moving to usage-based billing with AI Credits, where usage is calculated from input, output, and cached tokens at model-specific API rates. GitHub’s own explanation is that agentic workflows create much higher computing and inference demand, and that the old premium-request model is no longer sustainable. (The GitHub Blog)

    That matters because it shows the industry is moving away from the illusion of “unlimited AI” and toward metered token economics.

    The free buffet is ending.


    The Microsoft and Uber examples show the enterprise bill shock.

    The recent Microsoft example is important because it shows that even the largest technology companies are not immune to AI tool economics.

    The Verge reported that Microsoft planned to remove most internal Claude Code licenses and push many developers toward GitHub Copilot CLI. The report said Microsoft framed the move as convergence around its own agentic command-line tool, but sources also described the decision as financial, with the June 30 cutoff aligning with the end of Microsoft’s fiscal year and operating-expense management. (The Verge)

    That is not just a software-tool preference. It is a signal.

    If a company with enormous cloud capacity, deep AI partnerships, and ownership of major developer tooling still has to rationalize internal AI usage, then smaller API-dependent software companies should be very careful.

    Uber is another example of the same problem. The Verge reported that Uber had exhausted its annual AI budget four months into 2026 and that Uber president and COO Andrew Macdonald questioned whether rising token consumption was clearly translating into more useful consumer-facing features. He explicitly framed the problem as a tradeoff between token consumption, associated cost, and headcount. (The Verge)

    That is the enterprise CFO waking up.

    For the past two years, AI has often been treated as a productivity miracle. Now the accounting department is asking a more basic question:

    How much value are we getting per dollar of tokens?

    For many companies, that answer is still unclear.


    The “AI replaces workers” story is financially incomplete.

    The AI workforce-replacement story usually compares an employee salary to an AI subscription.

    That is the wrong comparison.

    The correct comparison is:

    Human cost

    versus:

    AI tokens + integration + QA + monitoring + human review + failure cost + rework + security + vendor lock-in

    AI can reduce labor in some workflows. But replacing people is not the same as replacing tasks.

    A human employee is expensive, but predictable and logical. AI can look cheap at the seat level and become expensive at the usage level. The more the organization depends on AI, the more variable cost it creates.

    This is why the layoff story is unstable. Challenger, Gray & Christmas reported that AI led all stated reasons for U.S. job cuts in March 2026, with 15,341 announced cuts that month, or 25% of total cuts. But the same report also shows AI alongside broader economic, restructuring, closing, and contract-loss pressures, meaning AI is often part of a broader cost-cutting narrative rather than proof that automation has fully replaced human labor. (Challenger Gray & Christmas)

    Gartner’s customer-service research is even more direct: it predicted that by 2027, 50% of companies that attributed headcount reductions to AI will rehire staff to perform similar functions under different job titles. Gartner also said many reductions were influenced by broader economic conditions, not automation alone, and that AI is not mature enough to fully replace human expertise, empathy, and judgment in customer service. (Gartner)

    That is the rehire problem.

    A company may lay off support agents, analysts, junior developers, or operations staff, then discover that the AI system still needs:

    • people to supervise outputs,
    • people to handle edge cases,
    • people to correct hallucinations,
    • people to maintain workflows,
    • people to interpret business context,
    • people to deal with customers when automation fails.

    At that point, the company has not replaced labor. It has changed the labor mix, added token cost, and possibly lost institutional knowledge.


    The ROI problem is already visible.

    A 2025 MIT NANDA report found that despite $30-40 billion in enterprise GenAI investment, 95% of organizations studied were getting zero return, with only 5% of integrated AI pilots extracting millions in value. The report said the issue was not mainly model quality or regulation, but implementation approach, brittle workflows, lack of contextual learning, and poor alignment with day-to-day operations.

    That finding is critical.

    It means the market’s problem is not simply, “AI is too expensive.”

    The deeper problem is:

    AI is expensive before companies have proven that it produces durable, measurable P&L value.

    That is a dangerous combination.

    High variable cost plus unclear ROI is not a revolution. It is a budget crisis waiting to happen.


    Is this a monopoly problem?

    A huge pricing gap by itself does not automatically prove monopoly or illegal conduct.

    In U.S. antitrust terms, monopoly law generally asks whether a firm has durable market power in a relevant market and whether that power was acquired or maintained through improper conduct, not merely through a better product or superior execution. The FTC’s own guidance explains that antitrust law targets conduct that unreasonably restrains competition by creating or maintaining monopoly power; courts also evaluate procompetitive justifications. (Federal Trade Commission)

    So, the legal question is not simply:

    “Is the API expensive?”

    The legal question is closer to:

    “Is a dominant AI platform using control over essential inputs to disadvantage downstream competitors while favoring its own products?”

    That is where the issue becomes serious.

    Possible antitrust questions include:

    QuestionWhy it matters
    Is the provider selling APIs at prices that make downstream competition economically impossible?This raises margin-squeeze concerns.
    Is the provider’s own app effectively subsidized below the price available to competitors?This can raise predatory pricing or self-preferencing questions, depending on market power and facts.
    Are API customers receiving lower quality, worse latency, smaller context, stricter limits, or delayed model access compared with the provider’s own app?This can indicate discriminatory access to essential input.
    Are cloud credits, enterprise bundles, or preferred integrations locking customers into one AI stack?This can raise tying, bundling, and switching-cost concerns.
    Can independent app providers realistically switch models, or are they trapped by model behavior, data formats, embeddings, tooling, and user expectations?High switching costs strengthen platform power.

    Regulators are already watching the AI stack. The FTC’s 2025 staff report on cloud-service-provider and AI-developer partnerships flagged competition concerns involving access to compute and engineering talent, switching costs, exclusivity and control rights, and access to sensitive technical and business information. (Federal Trade Commission)

    The OECD has also identified competition issues in adjacent cloud and AI infrastructure markets, including switching barriers, restrictive licensing, cloud credits that can make prices too low for smaller rivals to compete, and bundling or tying concerns. (OECD)

    So, the correct conclusion is careful but direct:

    The pricing gap alone is not proof of an illegal monopoly. But an 87.57 x to 856.14 x gap, combined with vertical integration, proprietary apps, cloud partnerships, developer lock-in, and discriminatory economics, is exactly the kind of fact pattern regulators should examine.


    What would a reasonable token price gap look like?

    The market will likely tolerate a difference between API token cost and app-side effective token cost. But it will not tolerate unlimited spread forever.

    A reasonable market structure would look more like this:

    GapMarket interpretation
    1x–3xHealthy wholesale/API pricing. App builders can compete.
    3x–10xAcceptable if API includes reliability, security, support, and commercial guarantees.
    10x–20xTolerable only for premium models, high SLA, regulated workloads, or guaranteed capacity.
    20x–50xDangerous. App builders need strict token controls and strong pricing power.
    50x–100x+Structurally hostile to downstream SaaS competition.
    856xNot a normal market gap. This implies subsidy, market control, non-comparable measurement, or strategic pricing pressure.

    For the AI application ecosystem to scale, the API cost must become closer to a wholesale input price.

    A model provider can still make money. But if the API is the raw material for the entire AI software market, it cannot be priced like a luxury retail product while the provider’s own application receives internal economics that no external competitor can match.

    That is not a platform. That is a controlled dependency.


    Why are API-based AI apps at risk?

    The current API economy creates several structural risks for application providers.

    1. Gross margins collapse at scale

    Many SaaS companies are used to high gross margins. Traditional software has low marginal cost once built. AI software is different. Each customer’s action may generate real inference cost.

    If usage grows faster than revenue, success creates losses.

    2. Pricing becomes unpredictable

    Customers want predictable SaaS pricing. APIs create variable cost.

    This creates a mismatch:

    • customers want flat monthly pricing,
    • AI vendors charge by usage,
    • agents consume tokens unpredictably,
    • support and QA costs rise with complexity.

    The app provider is squeezed between customer expectations and API billing reality.

    3. The best customers can become the least profitable

    In traditional SaaS, power users are often valuable. In AI SaaS, power users can be dangerous.

    A customer who uses the product heavily can consume more inference cost than they pay in subscription revenue.

    This is why many AI products quietly add limits, credits, throttles, “fair use” policies, smaller context windows, or degraded models.

    4. Provider-owned apps can undercut the market

    The model provider can bundle AI into its own app, cloud platform, office suite, coding tool, or enterprise package.

    The independent app has to pay API rates.

    The provider’s app can treat tokens as internal transfer cost, strategic subsidy, customer acquisition, cloud pull-through, or ecosystem lock-in.

    That is not a level playing field.

    5. The vendor can change the rules

    The API provider can change:

    • prices,
    • rate limits,
    • model availability,
    • context window policy,
    • changing discounts,
    • latency tiers,
    • safety behavior,
    • allowed use cases,
    • data retention terms,
    • enterprise minimums.

    The downstream app provider owns the customer relationship only partially. The upstream provider owns the cost structure.


    The workforce-replacement economics are weaker than the hype.

    Replacing workers with AI is economically rational only when all the following are true:

    1. the task is repeatable,
    2. the output can be verified cheaply,
    3. failure cost is low,
    4. context is bounded,
    5. hallucinations are manageable,
    6. integration is stable,
    7. token cost is predictable,
    8. Customer Experience does not degrade,
    9. human escalation remains available,
    10. AI produces measurable P&L improvement.

    Many business functions do not meet those conditions yet.

    In software development, for example, AI can accelerate coding, refactoring, testing, documentation, and debugging. But it can also create hidden review debt. If the AI produces code that requires senior engineers to inspect, repair, test, secure, and maintain, the productivity gain is real but not equivalent to replacing the engineer.

    In customer service, AI can handle repetitive questions. But when a customer is angry, confused, high-value, or dealing with an edge case, the company still needs human judgment.

    In finance, legal, healthcare, compliance, and enterprise operations, the cost of a wrong answer can exceed the cost of the labor supposedly replaced.

    This is why the smart economic model is not “replace people with AI.”

    The better model is:

    Use AI to increase the output of high-quality people, reduce low-value work, and automate bounded workflows where the cost of verification is lower than the cost of human execution.

    That is much less dramatic than the hype. It is also much more economically realistic.


    What AI app builders should do now?

    Any company building on LLM APIs should assume the current pricing structure is unstable.

    The correct response is not to avoid AI. The correct response is to build with brutal cost discipline.

    Required operating rules.

    RulePractical meaning
    Price by workflow, not by tokenCustomers buy outcomes; the company must internally map each outcome to token cost.
    Track cost per successful taskFailed runs, retries, and hallucinated outputs must be included.
    Use model routingDo not send every task to the most expensive model.
    Use smaller models where possibleClassification, extraction, routing, summarization, and formatting often do not need frontier models.
    Cache aggressivelyRepeated context should not be paid for repeatedly where caching is available.
    Limit contextPassing entire documents, repositories, or histories into every call destroys margins.
    Add budgets and quotasEvery tenant, user, workflow, and agent should have cost ceilings.
    Separate draft from finalUse cheaper models for drafts and stronger models for final reasoning or validation.
    Keep humans in the loop where failure is expensiveHuman review is not a weakness; it is cost control and risk control.
    Build provider portabilityAvoid becoming trapped in one model’s pricing, syntax, tools, and behavior.

    The long-term winners will not be the companies with the flashiest AI demos.

    They will be the companies with the best AI unit economics.


    The real test: cost per business outcome

    The market needs to stop asking:

    How much does a token cost?

    The better question is:

    How much does a successful business outcome cost?

    For example:

    WorkflowBad metricBetter metric
    AI codingTokens usedCost per accepted pull request
    AI supportChats handledCost and speed per resolved ticket without escalation
    AI legal reviewPages processedCost per accurate issue found and resolved
    AI finance analysisReports generatedCost per decision-grade analysis
    AI researchSources summarizedCost per validated insight
    AI salesEmails generatedCost per qualified meeting

    The token is not the product. The outcome is the product.

    But if token cost is 87.57 x to 856.14 x higher for API builders than for provider-owned apps, then outcome cost becomes structurally distorted.

    That is the core economic problem.


    Conclusion: the token economy must reset

    The current AI API economy does not scale cleanly.

    Not because AI is useless. AI is useful.

    Not because APIs are bad. APIs are essential.

    The problem is that the economics being offered to independent builders are often incompatible with the economics being enjoyed by the platform owners’ own applications.

    An 87.57 x to 856.14 x token gap means the API-based app is not competing on product quality alone. It is competing against a cost structure it cannot access.

    That creates three consequences:

    1. Independent AI apps become fragile.
      They can demo well but fail at scale because token costs eat the margin.
    2. Enterprise buyers face budget shock.
      AI looks cheap during pilots and becomes expensive in production.
    3. Platform owners gain market control.
      The same firms selling the raw intelligence can undercut, bundle, prioritize, or outlast the companies building on top of them.

    The legal question is still open. A pricing gap is not automatically a monopoly violation. But when the gap is extreme, persistent, and combined with vertical integration, cloud lock-in, app bundling, and control over essential AI inputs, it becomes a serious competition-policy issue.

    The economic conclusion is already clear:

    AI will not scale on the current token economy unless API prices fall, app prices rise, usage becomes tightly metered, or independent app builders radically reduce token consumption.

    The “AI replaces everyone” story is too simplistic.

    The real story is harsher:

    AI replaces some tasks, increases the leverage of some workers, creates new costs, exposes weak processes, and punishes companies that confuse a demo with a business model.

    The winners will not be the companies that use the most AI.

    The winners will be the companies that can prove, line by line, that every token they buy produces more value than it costs.

    1. for this article I used data from my new HD which has 4 months of work history, once combined with codex sessions data from the old HD it covers 8 months of codex sessions data with over 16 billion tokens. ↩︎