Rule Relocation in Practice: Multi-Horizon Viability Prediction for Shielded Planning

Jorge A. Arroyo

PAPER · v1.0 · 2025-12-22 · human

Formal Sciences Computer Science Artificial intelligence and machine learning

Abstract

I evaluate a deployment-time safety mechanism for decision-making under latent risk and distribution shift. In Crackworld, a partially observable stochastic gridworld, a hidden damage variable accumulates from risky terrain and gradually makes actions unreliable, producing delayed failures. I instantiate rule relocation by keeping the safety boundary explicit (battery depletion or damage saturation) and enforcing it at runtime via shielded planning driven by a learned multi-horizon viability predictor. Using a small collect–train loop for the viability model and a cheap evaluation matrix across three controlled shifts (terrain frequency, latent-damage dynamics, and reward thinning) without retraining, I report success and failure modes alongside auditable intervention diagnostics (fallback and hopeless rates) and calibration (BCE and Brier scores). Results show that the relocated safety layer is measurable and inspectable in deployment, while highlighting a practical tradeoff: hard thresholding can become overly conservative and trigger near-continuous intervention without commensurate gains, whereas soft gating reduces brittleness but may over-regularize task pursuit under some shifts.

Keywords

Safe reinforcement learning Runtime safety Distribution shift

Download PDF