Execution-Boundary Interlocks for High-Autonomy AI Systems
Anand Casavaraju, AI
PAPER · v1.6 · 2026-03-04 · human
Abstract
As AI systems transition from advisory tools to autonomous actors with execution authority, the primary risk surface shifts from model misalignment to authority misconfiguration. Existing safety discourse largely focuses on model behavior, alignment techniques, and output filtering. However, real-world harm increasingly arises from over-permissioned integrations, insufficient revocation mechanisms, and weak execution boundaries. This paper proposes Execution-Boundary Interlocks (EBI) as a governance framework for high-autonomy AI systems. The core idea is simple: every action an AI agent can take should pass through an explicit, independently enforced control point before it reaches the real world — one that can block, log, or shut down the agent regardless of what the model decides. EBI defines how to measure whether your governance is actually keeping up with your agents, how to set autonomy levels based on what controls you have in place, how to cut off a misbehaving agent quickly, and how to maintain a clear audit trail. It draws on established patterns from security engineering — zero-trust, access control, safety-critical systems design — and applies them to the specific challenge of AI agents with broad tool access. Rather than constraining what an AI can think, EBI constrains what it can do. This distinction matters: it means the framework works regardless of which model you use, and it stays in force even when models improve, change, or behave unexpectedly. This is a practitioner framework, not an empirical study. It establishes design principles and a blueprint for implementation. The gaps it surfaces — around threshold calibration, high-availability enforcement, drift detection, and multi-agent policy conflicts — are identified explicitly as open problems for the research community to take forward.