Ensemble of Minds: Articulated Cognitive Architectures of Small Language Models as a path to Frontier Performance
Bikramjeet, Singh Bedi
PAPER · v1.0 · 2025-08-24 · human
Abstract
The paradigm of scaling Large Language Models (LLMs) to achieve frontier performance has led to unprecedented capabilities, yet it has also created significant barriers to entry due to immense computational costs and the centralization of power in a few labs. This paper investigates a counter-paradigm: achieving high-level performance not by scaling a single model, but by structuring a collaborative ensemble of smaller, more accessible Language Models (SLMs, ¡8B parameters). We introduce the Ensemble of Minds (EoM), a novel framework that orchestrates four distinct SLMs into specialized, synergistic cognitive roles: a Proposer to generate initial solutions, a Verifier to perform critical analysis, a Refiner to incorporate feedback and improve solutions, and a Synthesizer to produce the final, polished output. We conduct a rigorous evaluation of the EoM framework against a leading proprietary model (GPT-4o), an average single SLM, and a naive ensemble baseline. Our experiments span a diverse set of challenging benchmarks, including mathematical reasoning (GSM8K, MATH), code generation (HumanEval, MBPP), and complex instruction following (BIG-Bench Hard). Our results demonstrate that the articulated cognitive structure of EoM yields substantial performance gains, closing the gap to the frontier model by an average of 68.7% across all benchmarks. On the challenging MATH benchmark, EoM achieves an accuracy of 49.2%, a dramatic improvement over the 28.5% of a single SLM. Furthermore, our qualitative analysis and human evaluations reveal that EoM’s structured reasoning process produces more transparent, verifiable, and robust solutions. This work provides strong evidence that multi-agent architectures of SLMs represent a powerful, efficient, and democratizing alternative to the monolithic scaling of models, paving the way for a new class of high-performance, interpretable AI systems.