Why We’re Moving Workloads Off Frontier Models...And Why You Will Too

Author: Nathan Gunn

November 24, 2025

‍

We are shifting meaningful workloads from closed frontier models to open-source generative AI. You should too.

‍

A year ago, most start-up’s consuming generative AI defaulted to a single, large frontier model exposed through a cloud API. If you believe everything you read on X, over the past several months, that default is quickly weakening. Founders and investors now highlight high performing open models as serious options. The precise adoption rate is hard to pin down, but the directional pattern is consistent: many early-stage startups now run open models on their own cloud, either alongside or instead of the largest closed systems. As we can attest, the reasons are almost entirely practical.

‍

The first reason is cost. At SecondLook Health, we see that a well-run open-source deployment can often deliver effective per token prices roughly two to five times lower than comparable closed APIs, even after including GPU rental, orchestration, and the engineering needed to stand the system up. Why does this matter? For many vertical AI products, inference spend eventually becomes the dominant component of cost of goods sold. If model calls appear in several steps of a workflow, total token volume (and cost) grows eye-wateringly quickly. A two to five times premium on that volume compresses gross margins, impairs pricing flexibility, and constrains experimentation. (Ironically, lower unit costs also make it feasible to fine tune more often and maintain several model variants for different customer segments or tasks, further enhancing open-source value.)

‍

The second reason is performance. On many public benchmarks, the best open models now sit within a few percentage points of leading frontier systems. For narrow, domain-specific tasks, open models fine-tuned on proprietary data often match or exceed the performance of a general-purpose, closed model used out of the box. Frontier systems still lead on some difficult reasoning or broad world knowledge tasks, and there remain use cases where they make sense. But it has become straightforward to decompose an application and assign different models to different steps. In healthcare, a self-hosted, lightweight open model can competently handle many elements of extraction, normalization, and classification. Larger open or closed models are reserved for the few points where their additional capability genuinely shifts analytic (or other business) outcomes. This orchestration of labor is also easier when open models are already under your control.

‍

The third reason is strategic - touching on proprietary data, compliance, and vendor risk. Operating a fine-tuned model on your own infrastructure makes it possible to meet the stricter reliability, compliance, and data-governance expectations of enterprise buyers in high-consequence sectors like healthcare, insurance, and legal services. You can promise that customer data is not reused for training because there is no upstream provider to reuse it. You can tune latency by placing models close to your data and by managing batching, routing, and redundancy yourself. You also reduce exposure to unilateral changes in pricing, rate limits, policies, or other behavior from a single frontier provider or cloud platform. In regulated domains with HIPAA and PHI concerns, clients want precise answers to questions such as where computation occurs, how long data is retained, and whether any third-party model benefits from their traffic. A self-hosted open stack makes those answers simpler.

‍

I have a strong suspicion that our experience and strategy is consistent with most startups. In the earliest phase, prototype and pilot on closed frontier APIs, because this minimizes time to value. As usage grows, instrument token consumption and identify the highest volume, most repetitive tasks, then migrate those tasks to open models fine-tuned on your own data and hosted in a private cloud. Keep a small number of frontier endpoints for the limited cases where they still provide distinct advantages.

‍



The Untapped AI Opportunity in Workers’ Compensation: Becoming Data-Driven Clinical Managers



Why We’re Moving Workloads Off Frontier Models...And Why You Will Too