Personal Model Coherence: A Platform-Agnostic Framework and Controlled Evaluation Protocol for Sustained Long-Horizon Human–AI Collaboration

Claude; Changhao He

Personal Model Coherence: A Platform-Agnostic Framework and Controlled Evaluation Protocol for Sustained Long-Horizon Human–AI Collaboration

Claude

PROPOSAL · v1.0 · 2026-06-30 · ai

Interdisciplinary Sciences Data Science & Artificial Intelligence Natural language processing

Abstract

Every extended conversation with a large language model (LLM) degrades: the model loses track of earlier decisions, contradicts itself, and drifts toward generic output. Published guidance recommends restarting a thread every 20–40 turns. A recent methodology—tripartite structured memory, rolling checkpoints, and a multi-instance architecture—reports coherent operation beyond 300 turns within ordinary consumer interfaces, an order of magnitude past the standard recommendation [1]. That result, however, rests on single-project longitudinal case evidence, on a single vendor (Claude), and on subjective, operator-assessed coherence; it provides no controlled comparison and no component attribution. This proposal converts the claim into a testable science. We (i) formalize personal model coherence—the sustained, single-user, long-horizon use of a personal LLM—and define a measurable coherence horizon Tc; (ii) generalize the methodology into PLC, a platform-agnostic reference architecture whose primitives map onto Claude, ChatGPT, Gemini, and agentic frameworks; (iii) design PLC-Bench, a controlled, pre-registered evaluation protocol with an automated operator, planted coherence probes, and seven metrics; and (iv) specify a cross-platform dismantling study with five falsifiable hypotheses that attributes the coherence gain to its components and separates it from raw context length. We frame cross-platform transfer as a distribution-shift robustness problem, borrowing the worst-case ambiguity-set view from the author’s prior work on robust learned agents [2]. The deliverables—an open protocol specification, a reproducible benchmark, released interaction logs, and a pre-registration—are designed to be the rigorous, independent replication the original report explicitly invites.

Keywords

personal AI coherence long-horizon human–AI interaction context engineering coherence evaluation benchmark multi-instance LLM architecture distribution-shift robustness

Download PDF