The Two-Layer Black Box: Operator Visibility, Commercial Secrecy, and a Minimum Disclosure Set for Accountable Autonomous AI Agents
Tatsuya Shimomoto
PAPER · v1.0 · 2026-07-03 · human
Abstract
The opacity of an autonomous AI agent is routinely discussed as a single "black box" problem, to be reduced by interpretability research or by demands for transparency. This paper argues that the opacity has two architecturally distinct causes that the discourse conflates, and that the conflation is what makes the transparency debate unwinnable. One cause is technical: behavior internalized into model weights cannot be read, versioned, or reverted without retraining. The other is commercial: the human-built layer around the weights — system prompts, rules, tool definitions, the agent loop — is technically inspectable but kept proprietary because it is a competitive differentiator. The two causes admit different responses, and treating them as one problem routes every transparency argument into an irresolvable safety-versus-secrecy collision. The paper makes three contributions, drawn from the implementation history of a single agent and re-expressed as harness-neutral judgments. First, it grounds the opacity problem in an *accountability gap*: a text-based prohibition on a capability that physically exists is a signpost, not a control, and enforcement admits a *prohibition-strength hierarchy* — absence, then scaffolding-layer enforcement, then untrusted-content boundary — most published safety work occupying only the weakest layer. Second, it separates the *two-layer black box*: technical internalization (Layer 1, largely permanent) from commercial secrecy of scaffolding (Layer 2, contingent on market structure). Third, it resolves Layer 2 with a *minimum disclosure set*: operator visibility is decoupled from public disclosure, and the minimum that makes post-incident causal tracing possible — which scaffolding version was active, which inputs reached the agent, which approval gated the version — is disclosable without publishing the scaffolding text. The resolution is implementable with current technology; it waits on neither an interpretability breakthrough nor a restructuring of markets and law. It complements existing AI risk-management and management-system standards by naming the operator-visibility floor those standards presuppose. The open questions — where exactly the disclosure line falls, and how the technology–society timescale gap is to be managed — are the research agenda.