Harness Alignment and Harness Drift: Why Intent, Unlike Correctness, Resists Automation

Tatsuya Shimomoto

Harness Alignment and Harness Drift: Why Intent, Unlike Correctness, Resists Automation

Tatsuya Shimomoto

PAPER · v1.0 · 2026-07-03 · human

Formal Sciences Computer Science Artificial intelligence and machine learning

Abstract

By early 2026, the discourse around agent harnesses — the configuration layer of skills, rules, prompts, and documentation shaping an LLM agent's behavior — has named two activities: harness engineering, the reactive practice of ensuring an agent never repeats a mistake, and harness optimization, autonomous search over harness code for benchmark score. Both run against fixed, checkable criteria. The activity on the other side — keeping the harness aligned with what its operator now wants, where the criterion moves — is performed daily and has, to the author's knowledge, no established name. The paper defines harness alignment — the continuous, human-gated activity of keeping an agent's harness aligned with the operator's evolving intent — and its failure mode, harness drift. The three defining properties — continuous, human-gated, bidirectional — follow from a single root: intent, unlike correctness, cannot be automated the same way — it has no verifier outside the operator, and moves as the operator's judgment sharpens; verifying intent sharpens the judgment doing it, so the loop moves its own target. An automated check freezes intent into a specification, reducing its automatable part to correctness work. A four-domain search found no established term covering all three; an audit shows 2026 drift coinages severed from the classical lineage that harness drift bridges. A six-phase cycle (Research, Extract, Curate, Promote, Measure, Maintain) operationalizes it over a memory structure whose boundary separates freely-writable records from gated behavior-shaping artifacts. Failure has two layers: artifact-side harness drift, and a human-side twin — gate complacency, deskilling, delegation-feedback divergence — anchored in the automation-complacency and ironies-of-automation literatures — structural inference, not measurement. Two running instances, differing in substrate, model, and knowledge genre, are offered as portability evidence, not efficacy. All of it comes from months-old practice — provisional judgments offered for testing, not settled practice.

Keywords

harness alignment harness drift intent alignment LLM agents agent harness human-in-the-loop

Download PDF