# MOAT v5g — Experimental Results Summary
### Stage 1 through Stage 2d  
*For peer-review relay. Codex / ChatGPT corrections applied.*

---

## Claim Hierarchy (current state)

| Stage | Claim | Status |
|---|---|---|
| **Stage 1** | External DE_B depletion → residual AUC collapse | ✓ done |
| **Stage 2a** | SRAAgent → endogenous DE_B depletion (no `wrong_strength`) | ✓ done |
| **Stage 2b** | Agent-internal misattribution: 88.8% of H_Q episodes have `angle(v_est, v_Q) < angle(v_est, v_B)` | ✓ done |
| **Stage 2c** | Policy-matched replay: AUC 0.762 → 0.553 (high SRA AUC was policy-geometry dependent) | ✓ done |
| **Stage 2d** | Directional AUC table: vB-aligned actions produce higher separability than vQ/H_Q-generated actions | ✓ done |
| **Stage 2e** | Residual AUC collapse *inside* the same endogenous adaptive loop | open / optional |

---

## Central Claim (current, peer-review safe)

> MOAT demonstrates a constructive minimal closed-loop mechanism in which structural misattribution endogenously depletes the B-discriminative direction, produces agent-internal attribution failure, and makes apparent residual distinguishability strongly dependent on action direction. The dissociation between external classifier distinguishability and agent-internal attribution correctness is the primary finding. We do not claim that full residual indistinguishability occurs inside the adaptive loop.

---

## Stage 1 — External Geometry Validation

**Setup:** `wrong_strength` parameter externally reduces DirectionalEnergy_B while preserving PE and total input energy.

**Result:** AUC_residual drops below 0.60 when DirectionalEnergy_B is depleted.

**Claim:** PE preservation and total energy preservation are insufficient to prevent residual distinguishability collapse when the discriminative direction is selectively depleted.

---

## Stage 2a — Endogenous Directional Depletion

**Setup:** SRAAgent with LS update `B_est += lr * outer(e_t, u_t) / ||u_t||^2`. No `wrong_strength`.

**Result:**

| Metric | Value |
|---|---|
| DirectionalEnergy_B (H_B late) | **0.826** ↑ |
| DirectionalEnergy_B (H_Q late) | **0.406** ↓ |
| Contrast | **0.420** |
| PE (H_Q late) | 0.300 ≥ threshold ✓ |
| Input Energy (H_Q late) | 2.000 ≥ threshold ✓ |

**Trajectory:** depletion occurs in t=0→3 (first update) and stabilises. H_B rises monotonically; H_Q stabilises ~0.40.

**Claim:** The adaptive misattribution update alone, without external injection, generates the directional-depletion signature while preserving PE and total energy.

---

## Stage 2b — Agent-Internal Misattribution (Primary Evidence)

**Diagnostic:** `angle(v_est, v_B)` vs `angle(v_est, v_Q)` from final `B_est - I` SVD.

**Result:**

| Condition | Rate | Mean angle to v_B | Mean angle to v_Q |
|---|---|---|---|
| H_B | correct = **0.983** | **6.9°** | 58.3° |
| H_Q | error = **0.888** | 55.2° | **15.3°** |

**Interpretation:** In 88.8% of H_Q episodes, the agent's estimated drift direction is closer to the Q-burst direction (v_Q) than the true B-drift direction (v_B). The agent structurally misattributes Q-burst evidence as B drift.

**Note:** Attribution angle is the *primary* evidence for internal misattribution. Attribution margin (B_est drift norm / mean residual sq: H_B=3.274, H_Q=0.506) is a supplementary proxy for B-channel absorption.

---

## Stage 2c — Policy-Matched Replay

**Setup:** H_Q SRA action sequences replayed in both H_B and H_Q environments with neutral residual (B_est = I).

**Result:**

| Policy | AUC |
|---|---|
| SRA adaptive (own policy) | **0.762** |
| Action-only (leakage check) | 0.550 ≈ chance |
| Replay: H_Q actions | **0.553** |
| Drop | **0.209** |

**Interpretation:** The high SRA adaptive AUC (0.762) was strongly dependent on the fact that H_B and H_Q agents generate different action directions (v_B vs v_Q). Under identical actions, residual separability drops to 0.553. The action-only AUC near chance confirms the classifier was not simply reading action labels; rather, the different trajectory geometries generated by each hypothesis's adaptive policy were creating distinguishable residual structure.

**Claim:** External residual distinguishability is policy-geometry dependent, not intrinsic to the environment under the SRAAgent's operating conditions.

---

## Stage 2d — Multi-Directional Replay Table

**Setup:** Six action sources replayed into both H_B and H_Q environments. Neutral residual throughout.

**Result:**

| Action source | AUC (mean) | linear | RFF |
|---|---|---|---|
| SRA adaptive (mixed policy) | 0.761 | 0.524 | 0.998 |
| Action-only leakage | 0.550 | 0.543 | 0.556 |
| **Replay: H_B actions** | **0.639** | 0.533 | 0.745 |
| **Replay: H_Q actions** | **0.537** | 0.530 | 0.544 |
| Replay: vB policy | **0.650** | 0.506 | 0.794 |
| Replay: vQ policy | 0.598 | 0.586 | 0.609 |
| Replay: isotropic | 0.518 | 0.511 | 0.525 |
| Replay: vB-oracle (≡ vB) | 0.640 | 0.503 | 0.777 |

**Directional pattern:**
```
vB policy ≈ vB-oracle ≈ H_B actions  >  vQ policy > H_Q actions ≈ isotropic
   ~0.645                                   0.598        ~0.528
```

**Safe claim:** vB-aligned and H_B-generated actions produce higher H_B/H_Q separability; H_Q-generated actions reduce separability toward the low-AUC regime. The misattributing policy *shifts* actions away from the B-discriminative direction, reducing distinguishability relative to vB-aligned policies.

**Labelling corrections applied:**
- `discriminative` renamed to `vB-oracle` (implementation is `v_b.copy()`, not an independently optimized oracle)
- `vQ > isotropic` is expected and not a failure: vQ excites Q-burst variance, raising separability above isotropic. Claim is `vB > vQ`, not `vQ < isotropic`.

---

## What Can Be Claimed (Summary)

### Defensible now

1. PE and attribution separability are independent conditions (§3.3 of theory document).
2. A minimal adaptive agent with incorrect structural attribution endogenously generates directional energy depletion while preserving PE and total input energy.
3. 88.8% of H_Q episodes show agent-internal misattribution (v_est ≈ v_Q, not v_B).
4. External residual AUC is strongly policy-geometry dependent (drop 0.762 → 0.553 under policy-matched replay).
5. AUC varies directionally: vB-aligned actions produce higher H_B/H_Q separability than vQ-aligned or H_Q-generated actions.
6. The dissociation `external distinguishability ≠ agent-internal attribution correctness` is empirically supported.

### Not yet claimed

1. Full residual indistinguishability inside the same endogenous adaptive loop (Stage 2e — open).
2. SRA is a genuinely new theory distinct from ABHT (current position: failure-mode benchmark within ABHT).
3. General causal identification under Spec-3 (hidden confounder) violation.
4. High-PE accelerates collapse (retracted).

---

## Revised Central Definition (SRA)

**Attribution Collapse** (revised):

> External evidence remains classifiable by an external observer, but the adaptive agent maps it into the wrong structural update channel, generating policies that avoid the B-discriminative direction and making apparent residual evidence policy-geometry dependent.

This is distinct from the prior definition ("residuals become externally indistinguishable") and is supported by the current experimental chain.

---

## Open Question (Stage 2e — optional)

Can the same endogenous misattribution loop produce residual AUC collapse (< 0.60) without external action-direction control?

Replay AUC = 0.553 is already near the 0.60 threshold. Possible directions:
- Longer episodes (more contamination accumulation)
- Higher `delta_b` / `delta_q` ratio
- Stronger directional concentration (`min_de` lower)

If Stage 2e fails: the reframing holds. SRA = internal attribution collapse despite external distinguishability.  
If Stage 2e succeeds: closes the full recursive loop claim.

Either outcome is informative. Negative result = ABHT family may already cover this geometry (valuable benchmark finding per §4.3).