# MOAT v5g-Final Specification ## Positioning MOAT v5g is a stress-test benchmark for policy-induced distinguishability collapse in active hypothesis testing / adaptive filtering under nonstationary partial observability. Safe framing: > Recursive self-poisoning is a measurable closed-loop failure mode in which wrong structural attribution distorts future policy, preserving input energy and excitation while reducing the policy-induced distinguishability between competing structural hypotheses. This is not framed as a new general theory of cognition or intelligence. It is a falsifiable benchmark and failure-mode taxonomy within active hypothesis testing, adaptive filtering, dual control, and closed-loop identification. ## System State and action are both 2D: ```math x_{t+1} = A x_t + B_{true}u_t + w_t, \quad x_t, u_t \in \mathbb{R}^2 ``` Competing hypotheses: ```math H_B: \quad B_{true} = I + \delta_B v_B v_B^\top ``` ```math H_Q: \quad Q_t = \sigma_w^2 I + \delta_Q \cdot \mathbf{1}_{burst}(t) \cdot v_Q v_Q^\top ``` Episode geometry: ```math v_B \sim \mathrm{Uniform}(S^1) ``` ```math v_Q = R(\theta)v_B, \quad \theta \sim \mathrm{Uniform}(30^\circ, 150^\circ) ``` Single-step indistinguishability design constraint: ```math \delta_B^2 \cdot \mathbb{E}[\|u_t\|^2] = \delta_Q ``` This matches one-step residual variance so that simple marginal residual statistics are insufficient; trajectory-level intervention response is required. ## Measurement Hierarchy ### Diagnostic Controls Evaluator-only metrics. These may use ground-truth geometry such as `v_B`, `v_Q`, and true hypothesis labels. They are not deployable-agent baselines. - `D_probe(t)`: AUC under fixed external probe input. Reference check that the environment remains distinguishable. - `D_oracle(t)`: AUC under correct-belief/oracle policy. Counterfactual diagnostic showing collapse is policy-induced rather than intrinsically impossible. - `DirectionalEnergy_B(t)`: energy projection onto the true B-drift direction: ```math DirectionalEnergy_B(t) = \frac{ v_B^\top \mathbb{E}[u_tu_t^\top]v_B }{ \mathrm{tr}(\mathbb{E}[u_tu_t^\top]) } ``` ### Performance Metrics No ground-truth geometry is exposed to agents or performance classifiers. - `AUC_residual(t)`: classifier AUC using residual trajectory `e_{t+3:t+3+k}` only. - `AUC_action(t)`: leakage-monitor AUC using action trajectory `u_{t:t+k}` only. - `D_norm(t)`: energy-normalized residual distinguishability: ```math D_{norm}(t) = \frac{AUC_{residual}(t)} {\mathrm{tr}(\mathbb{E}[u_tu_t^\top])} ``` ## SRA-Type Directional Collapse Criteria Declare SRA-type directional collapse only when all conditions hold: ```text D_probe AUC > 0.75 D_oracle AUC > 0.75 PE_policy >= threshold InputEnergy >= threshold DirectionalEnergy_B decreases sharply AUC_residual < 0.60 D_norm decreasing AUC_action < 0.55 [leakage check: PASS] ``` Interpretation: - The environment remains distinguishable under reference probing. - Correct-belief control would preserve distinguishability. - The wrong-belief policy keeps energy and rank high. - Yet it avoids the discriminative direction, so residual-trajectory distinguishability collapses. - The action-only classifier does not reveal hypothesis labels, reducing policy-signature leakage risk. ## Classifier Robustness Use multiple classifier families to avoid treating a classifier artifact as geometry: ```text linear SVM RBF-kernel SVM shallow MLP, 2-layer shallow LSTM for action-only leakage monitoring ``` Interpretation: ```text All families show same trend: likely geometry-level collapse. Only one family shows collapse: possible representation / classifier-capacity artifact. ``` ## Horizon Sweep Evaluate robustness across: ```text k in {5, 10, 20, 40} ``` Purpose: - Too short: insufficient trajectory evidence. - Too long: policy drift or chaotic accumulation may dominate. - Robust collapse across horizons is stronger evidence. ## Leakage Controls Use three independent classifier views: ```text Residual-only classifier: input = e_{t+3:t+3+k} main performance metric. Action-only classifier: input = u_{t:t+k} leakage monitor. AUC_action >= 0.55 indicates policy-signature leakage risk. Joint classifier: input = (u, e) diagnostic only. ``` If action-only AUC is high, the benchmark may be solvable through policy-signature memorization rather than residual trajectory attribution. ## Required Baselines Run all baselines under the same randomized geometry: ```text EKF / UKF adaptive estimation IMM / MMAE Particle filter Dual control Active Bayesian hypothesis testing ``` ABHT is the closest existing frame. MOAT v5g should be presented as a stress-test benchmark / failure-mode instance for ABHT and adaptive filtering, not as a wholly new theory of active information acquisition. ## Safe Reviewer Framing The study does not claim that action-dependent distinguishability is new; that is central to ABHT and controlled sensing. The claim is narrower: > Wrong structural attribution can induce policies that preserve input energy and persistent excitation while selectively avoiding discriminative directions, causing `D_probe` and `D_oracle` to remain high while `D_policy` / residual trajectory AUC collapses. Suggested title-style framings: - A Stress-Test Benchmark for Policy-Induced Distinguishability Collapse in Active Hypothesis Testing - Recursive Self-Poisoning as a Failure Mode of Active Hypothesis Testing under Structural Misattribution