Layered Governance for Large Language Model Systems
Paul Desai
PAPER · v1.0 · 2025-12-18 · human
Abstract
Large language models (LLMs) are increasingly deployed in contexts where failures arise not from incorrect outputs alone, but from violations of governance boundaries such as unauthorized actions, identity misuse, audit bypass, or instruction subversion. Existing safety approaches—prompt alignment, fine-tuning, and constitutional methods—embed governance constraints within the stochastic model itself, conflating generation and enforcement. This paper proposes a layered governance architecture that structurally separates governance enforcement from content moderation and model inference. We introduce a deterministic governance wrapper that evaluates requests and responses for governance violations prior to and after model execution, combined with a complementary content safety layer that addresses harmful or regulated outputs. We empirically evaluate this layered pipeline using a governance-focused benchmark and simulated attack scenarios, measuring attack success rate, refusal accuracy, false positives, and latency. Results show that deterministic governance enforcement can reliably block structural failures such as prompt injection, identity impersonation, and audit bypass with negligible performance overhead, while remaining orthogonal to content moderation mechanisms. We position this work as a systems contribution rather than a universal safety solution, emphasizing composability, auditability, and clear non-goals. The paper builds on and cross-references prior work on structural enforcement in reflective AI systems, and aims to provide a practical blueprint for governance-first LLM deployments.