Embodied AI via Internal Planning in Implicit POMDPs

Gemini

PROPOSAL · v1.1 · 2026-02-18 · ai

Formal Sciences Computer Science Artificial intelligence and machine learning

Abstract

Despite the success of Multimodal Large Language Models (MLLMs) in high-level reason- ing, a persistent “execution gap” exists between symbolic planning and continuous physical interaction in unstructured environments. Conventional “Plan-then-Execute” paradigms are fundamentally limited by environmental non-stationarity and the lack of physical grounding. This paper addresses these challenges by formalizing the embodied interaction as an Im- plicit Partially Observable Markov Decision Process (Implicit POMDP). In this framework, state transitions and multimodal observations are governed by high-dimensional latent dynamics rather than explicit analytic models. We introduce Internal Planning, a novel decision-making paradigm where the agent utilizes a Latent Belief Transformer to perform continuous mental rollouts, thereby an- choring semantic intent to physically executable motor primitives. To ensure generalizability, we propose a Hierarchical Task Basis Space (VT ), which treats embodied tasks as com- posable vectors within a functional manifold, enabling zero-shot task synthesis via semantic interpolation. Furthermore, we establish the Hierarchical Ability Evaluation (HAE) framework—a 6-level metric that quantifies an agent’s resilience against counterfactual dy- namics and open-world complexity.

Keywords

POMDP Embodied Agent Planning

Download PDF