2 Comments
User's avatar
Ian Kiku's avatar

Great post, and I agree that focusing on small, specialized LMs (SLMs) is the way forward. Today's LLMs are impressive, but they are heavy and sometimes slow. Small, specialized models are the next frontier to unlock faster responses and cost savings without sacrificing accuracy in context. Letting an AI agent pick the right "size" model for each job is smart engineering. However, we should not overlook that training or fine-tuning these SLMs still carries a significant cost barrier.

Even though SLMs can be cheaper to run in production, the upfront expense of curating high-quality datasets, computing distillation runs, and performing fine-tunes on domain data can easily reach tens or hundreds of thousands of dollars. Many smaller teams or startups cannot yet afford that infrastructure. Until accessible "SLM-as-a-service" platforms or open-weight domain models mature, it will remain a high bar for most.

Thanks for sparking this discussion, Ryan. Love the Russian doll analogy. It is exciting to see the "nesting doll" strategy (small model for most tasks, big model as backup) validated by the latest research.

https://arxiv.org/pdf/2506.02153#:~:text=well,only%20to%20much%20larger%20models

Expand full comment
Ryan Walden's avatar

definitely agree that fine-tuning SLMs is "off the menu" for a lot of organizations. I wonder how generalization differs between context engineered SLMs, using RAG to retrieve similar responses, and fine tuning with LoRA. Would make for an awesome paper if there isn't one already.

Expand full comment