Lightify: Cost-Aware Adaptive LLM Routing via Retrieval Confidence and Conflict-Aware Escalation

Pavan Manikanta Maddula

PAPER · v1.0 · 2026-04-28 · human

Formal Sciences Computer Science Artificial intelligence and machine learning

Abstract

Modern agent harnesses Claude Agent SDK, LangGraph, AutoGen, CrewAI, OpenHands, and LlamaIndex Agents delegate cost governance, local-inference arbitration, and memory conflict resolution entirely to the application developer. We present Lightify, an open-source middleware layer that supplies these primitives as runtime services beneath any agent loop. Its central mechanism is Retrieval-Confidence-Driven Routing (CDDR), which selects inference tiers based on the aggregate confidence of retrieved memory items rather than on query features alone. Lightify additionally provides a persistent SQLite memory store, Memory Conflict Detection (MCD) that flags contradictions and triggers tier escalation, and Confidence-Driven Prompt Shaping (CDPS) that adapts prompt verbosity to retrieval reliability. We evaluate routing policy on 1,197,316 exact-string unique queries from five public corpora (MS MARCO v2.1, WildChat-1M, Natural Questions, MMLU, GSM8K) and report Policy-Oracle Agreement (POA): the fraction of routing decisions matching a category-level oracle, not per-query output correctness. On the English-language subset (N=1,043,220; 87.1%), CDDR achieves POA [0.957, 0.958] (95% Wilson CI) at projected cost 98% below a naive Opus-only baseline ($0.001204 vs. $0.076 per query). On the non-trivial subset—WildChat and GSM8K only (N=125,390)—POA is [0.646, 0.651]. A 200-pair MCD stress test yields precision 0.943 [0.814, 0.984] and recall 0.330 [0.246, 0.427]. Per-query cost is a tier-cost projection; live API spend is reported only for the 20-query calibration pilot.

Keywords

cost-aware inference LLM routing retrieval-confidence routing persistent memory conflict detection local inference

Download PDF