# MOAT v5g-Final Specification

## Positioning

MOAT v5g is a stress-test benchmark for policy-induced distinguishability collapse in active hypothesis testing / adaptive filtering under nonstationary partial observability.

Safe framing:

> Recursive self-poisoning is a measurable closed-loop failure mode in which wrong structural attribution distorts future policy, preserving input energy and excitation while reducing the policy-induced distinguishability between competing structural hypotheses.

This is not framed as a new general theory of cognition or intelligence. It is a falsifiable benchmark and failure-mode taxonomy within active hypothesis testing, adaptive filtering, dual control, and closed-loop identification.

## System

State and action are both 2D:

```math
x_{t+1} = A x_t + B_{true}u_t + w_t,
\quad x_t, u_t \in \mathbb{R}^2
```

Competing hypotheses:

```math
H_B: \quad B_{true} = I + \delta_B v_B v_B^\top
```

```math
H_Q: \quad Q_t = \sigma_w^2 I
  + \delta_Q \cdot \mathbf{1}_{burst}(t) \cdot v_Q v_Q^\top
```

Episode geometry:

```math
v_B \sim \mathrm{Uniform}(S^1)
```

```math
v_Q = R(\theta)v_B,
\quad \theta \sim \mathrm{Uniform}(30^\circ, 150^\circ)
```

Single-step indistinguishability design constraint:

```math
\delta_B^2 \cdot \mathbb{E}[\|u_t\|^2] = \delta_Q
```

This matches one-step residual variance so that simple marginal residual statistics are insufficient; trajectory-level intervention response is required.

## Measurement Hierarchy

### Diagnostic Controls

Evaluator-only metrics. These may use ground-truth geometry such as `v_B`, `v_Q`, and true hypothesis labels. They are not deployable-agent baselines.

- `D_probe(t)`: AUC under fixed external probe input. Reference check that the environment remains distinguishable.
- `D_oracle(t)`: AUC under correct-belief/oracle policy. Counterfactual diagnostic showing collapse is policy-induced rather than intrinsically impossible.
- `DirectionalEnergy_B(t)`: energy projection onto the true B-drift direction:

```math
DirectionalEnergy_B(t)
=
\frac{
v_B^\top \mathbb{E}[u_tu_t^\top]v_B
}{
\mathrm{tr}(\mathbb{E}[u_tu_t^\top])
}
```

### Performance Metrics

No ground-truth geometry is exposed to agents or performance classifiers.

- `AUC_residual(t)`: classifier AUC using residual trajectory `e_{t+3:t+3+k}` only.
- `AUC_action(t)`: leakage-monitor AUC using action trajectory `u_{t:t+k}` only.
- `D_norm(t)`: energy-normalized residual distinguishability:

```math
D_{norm}(t)
=
\frac{AUC_{residual}(t)}
{\mathrm{tr}(\mathbb{E}[u_tu_t^\top])}
```

## SRA-Type Directional Collapse Criteria

Declare SRA-type directional collapse only when all conditions hold:

```text
D_probe AUC        > 0.75
D_oracle AUC       > 0.75
PE_policy          >= threshold
InputEnergy        >= threshold
DirectionalEnergy_B decreases sharply
AUC_residual       < 0.60
D_norm             decreasing
AUC_action         < 0.55  [leakage check: PASS]
```

Interpretation:

- The environment remains distinguishable under reference probing.
- Correct-belief control would preserve distinguishability.
- The wrong-belief policy keeps energy and rank high.
- Yet it avoids the discriminative direction, so residual-trajectory distinguishability collapses.
- The action-only classifier does not reveal hypothesis labels, reducing policy-signature leakage risk.

## Classifier Robustness

Use multiple classifier families to avoid treating a classifier artifact as geometry:

```text
linear SVM
RBF-kernel SVM
shallow MLP, 2-layer
shallow LSTM for action-only leakage monitoring
```

Interpretation:

```text
All families show same trend:
  likely geometry-level collapse.

Only one family shows collapse:
  possible representation / classifier-capacity artifact.
```

## Horizon Sweep

Evaluate robustness across:

```text
k in {5, 10, 20, 40}
```

Purpose:

- Too short: insufficient trajectory evidence.
- Too long: policy drift or chaotic accumulation may dominate.
- Robust collapse across horizons is stronger evidence.

## Leakage Controls

Use three independent classifier views:

```text
Residual-only classifier:
  input = e_{t+3:t+3+k}
  main performance metric.

Action-only classifier:
  input = u_{t:t+k}
  leakage monitor.
  AUC_action >= 0.55 indicates policy-signature leakage risk.

Joint classifier:
  input = (u, e)
  diagnostic only.
```

If action-only AUC is high, the benchmark may be solvable through policy-signature memorization rather than residual trajectory attribution.

## Required Baselines

Run all baselines under the same randomized geometry:

```text
EKF / UKF adaptive estimation
IMM / MMAE
Particle filter
Dual control
Active Bayesian hypothesis testing
```

ABHT is the closest existing frame. MOAT v5g should be presented as a stress-test benchmark / failure-mode instance for ABHT and adaptive filtering, not as a wholly new theory of active information acquisition.

## Safe Reviewer Framing

The study does not claim that action-dependent distinguishability is new; that is central to ABHT and controlled sensing.

The claim is narrower:

> Wrong structural attribution can induce policies that preserve input energy and persistent excitation while selectively avoiding discriminative directions, causing `D_probe` and `D_oracle` to remain high while `D_policy` / residual trajectory AUC collapses.

Suggested title-style framings:

- A Stress-Test Benchmark for Policy-Induced Distinguishability Collapse in Active Hypothesis Testing
- Recursive Self-Poisoning as a Failure Mode of Active Hypothesis Testing under Structural Misattribution