Editorial Note: This document is an integrated version produced through multiple rounds of peer-review relay with ChatGPT, Codex, Gemini, and Perplexity. The original relay logs — including the condensation process from observer theory to engineering failure theory — are archived in the appendix (appendix.html). Code and numerical experiments are also in the appendix.
This paper defines and formalizes a specific failure mode exhibited by adaptive systems in non-stationary partially observable environments — Attribution Collapse — and presents the design specification for a detection benchmark (MOAT v5g). The central proposition is: the Persistent Excitation (PE) condition can guarantee parameter identifiability, but does not guarantee Attribution Separability between competing structural hypotheses (B drift / Q burst). An update to the wrong latent channel distorts the policy; the distorted policy contaminates future trajectory evidence; and trajectory-level distinguishability is recursively degraded (Recursive Attribution Poisoning). This paper is not presented as a new theory, but explicitly positioned as a stress-test benchmark in the vicinity of ABHT / controlled sensing / dual control.
§1Theory Purification: What Was Discarded and What Remained
The previous version was centered on observer / selfhood / phenomenology. Through a multi-AI peer-review relay, the following deletions and transformations were made.
observer (operational observer definition), self / selfhood (F-coalgebraic fixed point), consciousness / phenomenology / qualia, AQFT ontology (ontological interpretation of von Neumann algebras), IIT connection, High-PE Paradox (retracted due to lack of formal support), strong claims about "causation" (retreated to structured residual attribution).
What remained after removal:
When an adaptive system attributes residuals to an incorrect latent factor and updates under partial observability, it can — through its own modified policy — distort the statistical and geometric structure of future residuals, recursively destroying its own future identifiability.
This is not the philosophy of the observer but the closed-loop failure geometry of adaptive control. The latter is more amenable to peer review and experimentally falsifiable than the former.
§2Definition of Structured Residual Attribution (SRA)
2.1 Factorization of Prediction Residuals
In the state-space model $x_{t+1} = A_t x_t + B_t u_t + w_t$ with observation $y_t = C_t x_t + v_t$, the agent's prediction residual $e_t = y_t - \hat{y}_t$ is decomposed into four components:
$$e_t = \underbrace{\Delta B_t \cdot u_t}_{\text{action-channel drift}} + \underbrace{\Delta A_t \cdot x_t}_{\text{world dynamics drift}} + \underbrace{\Delta w_t}_{\text{exogenous disturbance}} + \underbrace{\Delta C_t \cdot x_t}_{\text{sensor drift}}$$| Attribution Target | Internal Model to Update | Penalty on Misattribution |
|---|---|---|
| $\Delta B_t$ | $\hat{B}_t$ (intervention model) | Misattributed to chaos → ignores drop in intervention efficiency; loss of control |
| $\Delta A_t$ | $\hat{A}_t$ (transition model) | Misattributed to action → contaminates $\hat{B}_t$, causing self-destruction |
| $\Delta Q_t$ | $Q_{est}$ (noise covariance only) | Misattributed to structural model → collapse through over-adaptation |
| $\Delta C_t$ | $\hat{C}_t$ or $R_{est}$ | Misattributed to world → collapse of world model (Freezing) |
2.2 Attribution Subspace and Identifiability
Define the subspaces generated in residual space by each component:
$$\mathcal{S}_B = \text{range}\bigl(\mathbb{E}[e_t u_t^\top \mid \Delta B \neq 0]\bigr), \quad \mathcal{S}_Q \ni \text{Corr}(\|e_t\|^2, \|u_t\|^2) > 0$$Attribution separability is measured by the principal angle $\theta_{BQ}$ between the two subspaces. $\theta_{BQ} \to 0$ (subspace overlap) is the geometric indicator of attribution collapse.
2.3 Identifiability Assumptions (Spec-1 to 5)
Current SRA functions as a Causal Attribution Proxy only under the following assumptions, which are fixed explicitly as benchmark specifications.
- — Spec-1: Process noise $w_t$ is independent of action $u_t$ (zero-mean)
- — Spec-2: Observation noise $v_t$ is independent of action
- — Spec-3: No hidden confounder exists (or $u_t \perp z_t$)
- — Spec-4: Action satisfies the Persistent Excitation (PE) condition: $\frac{1}{T}\sum u_t u_t^\top \geq \alpha I$
- — Spec-5: B_true is quasi-stationary within observation window $W$ (rate of change $\ll 1/W$)
Under Spec-3 violation (Hidden Confounder), $\mathbb{E}[e_t u_t^\top] \neq 0$ can hold even when B_true is unchanged (see §6).
§3Formalization of Attribution Collapse
3.1 Closed-Loop Contamination Jacobian
Define the augmented state system containing contamination $\delta_t = \text{vec}(\Delta_t) = \text{vec}(B_{est,t} - B_{true})$:
$$z_t = \begin{pmatrix} x_t \\ \delta_t \end{pmatrix}, \quad z_{t+1} = F(z_t, w_t, \xi_t)$$Expected Jacobian from linearization around $\delta_t = 0$:
$$J = \mathbb{E}\!\left[\frac{\partial F}{\partial z}\right] = \begin{pmatrix} A_{cl} & B_{true} K P_x^{1/2} \otimes I_n \\ 0 & I_{nm} - \alpha I_0 \otimes I_n \end{pmatrix}$$where $I_0 = K B_{true} P_x B_{true}^\top K^\top + \Sigma_\xi$ (nominal action covariance). As contamination grows, $\tilde{I}(\Delta_t) = \mathbb{E}[u_t u_t^\top | \Delta_t]$ changes, and under certain update rules and policy dependencies, the contamination can produce an unstable regime where $\rho(J_\delta) > 1$:
$$\|\Delta_t\| \nearrow \;\Rightarrow\; \lambda_{max}(\tilde{I}) \nearrow \;\Rightarrow\; \rho(J_\delta) > 1 \;\Rightarrow\; \|\Delta_{t+1}\| > \|\Delta_t\|$$3.2 Definition of Recursive Attribution Poisoning
Contamination of B_est through misattribution ↓ Distorted action u_t = K(B_est) x_t ↓ Biased residual e_t regenerates "evidence" for further misattribution ↓ Further contamination of B_est └─── Positive feedback loop (no natural recovery within evaluation window)
| Adaptive Instability | Attribution Collapse | |
|---|---|---|
| Recovery | Natural recovery after disturbance | No natural recovery within observation window |
| Cause | Excessive learning rate, etc. (quantitative) | Update in wrong direction (structural) |
| Contamination Propagation | Acts independently | Contaminated model induces future misattribution |
| Closed-Loop Nature | Open-loop failure | Agent itself generates false evidence |
3.3 Independence of PE Preservation and Attribution Separability
The most defensible central proposition of this paper:
Note: The claim that "higher PE accelerates collapse (High-PE Paradox)" lacked formal support and has been retracted. The correct claim is: "collapse can occur even when PE is maintained."
§4Definition of Trajectory-Level Distinguishability
4.1 Three Distinguishability Metrics
We target systems where $P(e_t | H_B) = P(e_t | H_Q)$ holds by design for single-shot residual statistics. Discrimination becomes possible only by examining the trajectory-level intervention response structure.
Three specialized metrics:
— $D_{probe}$: under fixed exogenous probe $\pi_{probe}$ (reference distinguishability of the environment)
— $D_{policy}$: under policy $\pi_{b_t}$ induced from current belief $b_t$
— $D_{oracle}$: under true belief (counterfactual for diagnosis; not a performance metric)
$D_{oracle}$ is not a deployable baseline; it is a counterfactual diagnostic control for verifying that collapse is caused by "causal induction through a wrong-belief policy" rather than "intrinsic non-identifiability of the environment." It is not used in agent performance comparisons.
4.2 Trajectory-Level Definition of Recursive Attribution Poisoning
$$\boxed{D_{policy}^{b_{t+1}^{wrong}} < D_{policy}^{b_t} \quad \text{and} \quad D_{probe,t+1} \approx D_{probe,t}}$$The world remains identifiable, but an agent holding an incorrect belief keeps generating a trajectory distribution that is difficult for itself to identify. Conditions (not necessities) under which $D_{policy}$ decreases:
When the policy gradient under the wrong update $b_t^{wrong}$ points in the direction opposite to increasing $D_{policy}$ — that is, when $\pi_{wrong}$ avoids the discriminative direction separating B and Q, or generates a trajectory distribution that makes the responses of both hypotheses more similar — the condition holds. This is a structural condition and does not hold universally.
Diagnostic Metric for Directional Collapse
$$DirectionalEnergy_B(t) := \frac{v_B^\top \mathbb{E}[u_t u_t^\top] v_B}{\text{tr}(\mathbb{E}[u_t u_t^\top])}$$When only the projection onto direction $v_B$ (the discriminative direction for B drift) decreases while the total input energy (magnitude) and rank (PE) are preserved, this is termed Directional Collapse.
4.3 Differences from Existing Theory (Modest Claims)
| Theory | What It Addresses | Relationship to This Paper |
|---|---|---|
| IMM / MMAE | Model weight updates over a fixed hypothesis set | Does not update the model structure itself. This paper treats structural updating as the failure source. |
| Dual Control | Maximizing identification sensitivity (exploration) | Aims to increase information. This paper addresses defense against endogenous collapse of identifiability. |
| Active BHT | Action selection maximizing $D_{policy}$ | Closest existing theory. This paper defines its content as a failure mode of ABHT. |
| Particle Filter | Representation of posterior (particle diversity) degeneration | This paper addresses degradation of the future evidence distribution, not the posterior. |
| Closed-Loop ID | Estimation bias from closed-loop (static) | Already known. This paper addresses the dynamic process in which adaptation itself endogenously destroys identifiability. |
Minimal differential candidate: not "action changes identifiability" (the center of ABHT), but rather the Directional Collapse pattern in which "wrong updates destroy only the projection onto the discriminative direction while preserving PE and energy."
§5MOAT v5g — Benchmark Design Specification
5.1 Two-Dimensional Minimal Counterexample System
$$x_{t+1} = Ax_t + B_{true}u_t + w_t, \quad x_t, u_t \in \mathbb{R}^2$$ $$B_{true} = I + \delta_B v_B v_B^\top, \quad Q_t = \sigma_w^2 I + \delta_Q \cdot \mathbf{1}_{burst}(t) \cdot v_Q v_Q^\top$$Per-Episode Random Geometry (Leakage Prevention)
$$v_B \sim \text{Uniform}(S^1), \quad v_Q = \frac{R(\theta)v_B + \epsilon}{\|R(\theta)v_B + \epsilon\|}, \quad \theta \sim \text{Uniform}(30°, 150°)$$Design Constraint for Single-Shot Non-Identifiability
$\delta_B$ and $\delta_Q$ are set to approximately satisfy the following:
$$\text{Var}(e_t \mid H_B) \approx \text{Var}(e_t \mid H_Q)$$(Adjusted so that $\delta_B^2 \cdot \mathbb{E}[\|u_t\|^2] \approx \delta_Q$. This is an approximate equilibrium, not an exact equality; scale is verified per episode.)
The agent cannot distinguish $H_B$ from $H_Q$ using single-shot residual statistics alone. Discrimination requires examining the trajectory-level intervention response structure (windowed cross-covariance).
5.2 Two-Level Separation of Measurement Architecture
■ Diagnostic Controls (evaluator-only — ground-truth v_B, v_Q accessible)
D_probe(t) : AUC under fixed exogenous probe (proof of environment survival)
D_oracle(t) : AUC under true-belief policy (for causal diagnosis)
DirectionalEnergy_B(t) : v_B⊤ E[u_t u_t⊤] v_B / tr(E[u_t u_t⊤])
■ Performance Metrics (god's-eye information fully concealed)
AUC_residual(t) : Discrimination AUC from e_{t+3:t+3+k} (with time lag)
AUC_action(t) : Discrimination AUC from u_{t:t+k} only (leakage monitor)
D_norm(t) : AUC_residual / tr(E[u_t u_t⊤])
5.3 Collapse Detection Conditions and Measurement Artifact Defenses
D_probe AUC > 0.75 [environment is identifiable] D_oracle AUC > 0.75 [maintained under correct belief] PE_policy ≥ thresh [sufficient input rank] InputEnergy ≥ thresh [sufficient input energy] DirectionalEnergy_B ↓↓↓ [only projection onto v_B direction depleted] AUC_residual < 0.60 [non-identifiable under current policy] D_norm decreasing AUC_action < 0.55 [leakage check: PASS]
Measurement Artifact Defense Checklist
- — Multi-model validation: Confirm consistent collapse across all three classifiers — Linear SVM / RBF-kernel SVM / Shallow MLP (2-layer) — to rule out representation bias of any single model.
- — Higher-moment leakage audit: A Shallow LSTM must be added to the leakage classifier monitoring AUC_action. Episodes exceeding 0.55 are excluded.
- — Horizon sweep: Confirm robustness of collapse across $k \in \{5, 10, 20, 40\}$.
- — Random geometry validation: Full-episode runs with fixed $v_B, v_Q$ are prohibited (to prevent memorization of fixed fingerprints).
Required Baseline Suite (All Run Under Identical Conditions)
| Baseline | What It Must Be Surpassed On |
|---|---|
| EKF / UKF | Representative of classical closed-loop ID |
| IMM / MMAE | Hypothesis filter over fixed model set |
| Particle Filter | Confirm difference from posterior collapse |
| Dual Control | Confirm difference from information-maximizing exploration |
| Active BHT | Closest existing theory — primary baseline |
If all baselines (including ABHT) avoid $D_{policy}$ collapse, this is not a "refutation of a new theory" but rather a valuable negative result: "the ABHT family already covers this geometric pathology." Either outcome has value as a benchmark finding.
§6Geometric Destruction by Hidden Confounder
We formalize the case of Spec-3 violation (Hidden Confounder $z_t$). When $u_t = \pi(x_t) + \gamma c_u z_t + \xi_t$ and $w_t = \tilde{w}_t + \beta c_e z_t$:
$$\mathbb{E}[e_t u_t^\top] = \underbrace{\Delta B \cdot I_0}_{\text{true B drift signal}} + \underbrace{\beta\gamma\sigma_z^2 c_e c_u^\top}_{=: \mathbf{C} \text{ (spurious B drift signal)}}$$Even when $\Delta B = 0$, the second term on the right is nonzero, causing SRA's mean-channel attribution to update $\hat{B}_t$ despite the absence of any change in B_true.
Current SRA is a Causal Attribution Proxy for environments where Spec-1 through 5 hold. It has not reached a general causal identification theory capable of handling Hidden Confounders (Spec-3 violations). The paper's Limitations section will explicitly include MOAT v5g Phase 5 under Spec-3 violation as a "limit test for observing where this framework breaks down."
FinalOpen Problems · What Can Be Claimed Strongly · What Cannot Yet Be Claimed
- The PE condition aids parameter identifiability but does not guarantee attribution separability — these two requirements are independent
- The mechanism (Recursive Attribution Poisoning) by which an incorrect latent component update distorts the policy in a closed loop and contaminates future residual statistics can be formalized as a state space
- Under certain update rules and policy dependencies, the conditions for an unstable regime where $\rho(J_\delta) > 1$ can occur can be described (under Spec-1 to 5)
- The divergence between $D_{probe}$ and $D_{policy}$ can be measured as a counterfactual comparison between exogenous probe and wrong-belief policy
- Directional Collapse via DirectionalEnergy_B and PE Collapse can be defined as independent diagnostic metrics
- MOAT v5g is designed as a falsifiable stress-test benchmark directly comparable against ABHT / IMM / PF / Dual Control
- The fact that the current framework breaks under Hidden Confounder (Spec-3 violation) can be explicitly stated in the paper's Limitations section
- That SRA constitutes a genuinely new theory distinct from ABHT / active hypothesis testing — at present it is a description of a failure mode
- General causal identification in environments including Spec-3 (Hidden Confounder) violations has not been reached
- The claim that High-PE accelerates attribution collapse (High-PE Paradox) — retracted due to lack of formal support
- A definitive claim that recursive attribution poisoning is fundamentally distinct from particle depletion / posterior collapse — the possibility that it is a special case of active Bayesian filtering remains
- Theoretical remedies if all baseline systems avoid collapse in experiments
- Connections to observer / self / consciousness / phenomenology — these lie outside the scope of this paper
- Empirical validation of the High-PE / High-Overlap regime: Can MOAT v5g reproduce episodes where PE is sufficiently maintained yet $D_{policy}$ collapses?
- Definitive differential from ABHT: If Active BHT as the primary baseline avoids Directional Collapse, does the differential of SRA reduce solely to "benchmarking the failure mode of endogenous destruction of identifiability"?
- Implementation of mean / variance channel separation: Does separation of mean_attr (B drift signal) and var_attr (action-induced noise signal) function in an action-confounded noise environment?
- Attribution-Aware Exploration: An action selection rule for maintaining the discriminative direction separating B and Q. Its objective — "maintaining separability" — differs from Dual Control's Fisher information maximization; can this become an independently implementable concept?
- Implementation of the 2D counterexample system and baseline experiments: Run all baselines (EKF / IMM / PF / Dual Control / Active BHT) under identical conditions to determine whether Directional Collapse is unique to SRA or a known pathology already covered by ABHT.