Structured Residual Attribution — A Benchmark Note on Attribution Collapse

Editorial Note: This document is an integrated version produced through multiple rounds of peer-review relay with ChatGPT, Codex, Gemini, and Perplexity. The original relay logs — including the condensation process from observer theory to engineering failure theory — are archived in the appendix (appendix.html). Code and numerical experiments are also in the appendix.

Abstract

This paper defines and formalizes a specific failure mode exhibited by adaptive systems in non-stationary partially observable environments — Attribution Collapse — and presents the design specification for a detection benchmark (MOAT v5g). The central proposition is: the Persistent Excitation (PE) condition can guarantee parameter identifiability, but does not guarantee Attribution Separability between competing structural hypotheses (B drift / Q burst). An update to the wrong latent channel distorts the policy; the distorted policy contaminates future trajectory evidence; and trajectory-level distinguishability is recursively degraded (Recursive Attribution Poisoning). This paper is not presented as a new theory, but explicitly positioned as a stress-test benchmark in the vicinity of ABHT / controlled sensing / dual control.

§1Theory Purification: What Was Discarded and What Remained

The previous version was centered on observer / selfhood / phenomenology. Through a multi-AI peer-review relay, the following deletions and transformations were made.

Concepts Entirely Removed

observer (operational observer definition), self / selfhood (F-coalgebraic fixed point), consciousness / phenomenology / qualia, AQFT ontology (ontological interpretation of von Neumann algebras), IIT connection, High-PE Paradox (retracted due to lack of formal support), strong claims about "causation" (retreated to structured residual attribution).

What remained after removal:

When an adaptive system attributes residuals to an incorrect latent factor and updates under partial observability, it can — through its own modified policy — distort the statistical and geometric structure of future residuals, recursively destroying its own future identifiability.

This is not the philosophy of the observer but the closed-loop failure geometry of adaptive control. The latter is more amenable to peer review and experimentally falsifiable than the former.

§2Definition of Structured Residual Attribution (SRA)

2.1 Factorization of Prediction Residuals

In the state-space model $x_{t+1} = A_t x_t + B_t u_t + w_t$ with observation $y_t = C_t x_t + v_t$, the agent's prediction residual $e_t = y_t - \hat{y}_t$ is decomposed into four components:

$$e_t = \underbrace{\Delta B_t \cdot u_t}_{\text{action-channel drift}} + \underbrace{\Delta A_t \cdot x_t}_{\text{world dynamics drift}} + \underbrace{\Delta w_t}_{\text{exogenous disturbance}} + \underbrace{\Delta C_t \cdot x_t}_{\text{sensor drift}}$$

Definition: Selective Plasticity Correspondence table between each drift cause and the target of update.

Attribution Target	Internal Model to Update	Penalty on Misattribution
$\Delta B_t$	$\hat{B}_t$ (intervention model)	Misattributed to chaos → ignores drop in intervention efficiency; loss of control
$\Delta A_t$	$\hat{A}_t$ (transition model)	Misattributed to action → contaminates $\hat{B}_t$, causing self-destruction
$\Delta Q_t$	$Q_{est}$ (noise covariance only)	Misattributed to structural model → collapse through over-adaptation
$\Delta C_t$	$\hat{C}_t$ or $R_{est}$	Misattributed to world → collapse of world model (Freezing)

2.2 Attribution Subspace and Identifiability

Define the subspaces generated in residual space by each component:

$$\mathcal{S}_B = \text{range}\bigl(\mathbb{E}[e_t u_t^\top \mid \Delta B \neq 0]\bigr), \quad \mathcal{S}_Q \ni \text{Corr}(\|e_t\|^2, \|u_t\|^2) > 0$$

Attribution separability is measured by the principal angle $\theta_{BQ}$ between the two subspaces. $\theta_{BQ} \to 0$ (subspace overlap) is the geometric indicator of attribution collapse.

2.3 Identifiability Assumptions (Spec-1 to 5)

Current SRA functions as a Causal Attribution Proxy only under the following assumptions, which are fixed explicitly as benchmark specifications.

Identifiability Assumptions (Benchmark Spec)

— Spec-1: Process noise $w_t$ is independent of action $u_t$ (zero-mean)
— Spec-2: Observation noise $v_t$ is independent of action
— Spec-3: No hidden confounder exists (or $u_t \perp z_t$)
— Spec-4: Action satisfies the Persistent Excitation (PE) condition: $\frac{1}{T}\sum u_t u_t^\top \geq \alpha I$
— Spec-5: B_true is quasi-stationary within observation window $W$ (rate of change $\ll 1/W$)

Under Spec-3 violation (Hidden Confounder), $\mathbb{E}[e_t u_t^\top] \neq 0$ can hold even when B_true is unchanged (see §6).

§3Formalization of Attribution Collapse

3.1 Closed-Loop Contamination Jacobian

Define the augmented state system containing contamination $\delta_t = \text{vec}(\Delta_t) = \text{vec}(B_{est,t} - B_{true})$:

$$z_t = \begin{pmatrix} x_t \\ \delta_t \end{pmatrix}, \quad z_{t+1} = F(z_t, w_t, \xi_t)$$

Expected Jacobian from linearization around $\delta_t = 0$:

$$J = \mathbb{E}\!\left[\frac{\partial F}{\partial z}\right] = \begin{pmatrix} A_{cl} & B_{true} K P_x^{1/2} \otimes I_n \\ 0 & I_{nm} - \alpha I_0 \otimes I_n \end{pmatrix}$$

where $I_0 = K B_{true} P_x B_{true}^\top K^\top + \Sigma_\xi$ (nominal action covariance). As contamination grows, $\tilde{I}(\Delta_t) = \mathbb{E}[u_t u_t^\top | \Delta_t]$ changes, and under certain update rules and policy dependencies, the contamination can produce an unstable regime where $\rho(J_\delta) > 1$:

$$\|\Delta_t\| \nearrow \;\Rightarrow\; \lambda_{max}(\tilde{I}) \nearrow \;\Rightarrow\; \rho(J_\delta) > 1 \;\Rightarrow\; \|\Delta_{t+1}\| > \|\Delta_t\|$$

3.2 Definition of Recursive Attribution Poisoning

Definition: Attribution Collapse (Recursive Attribution Poisoning) For contamination magnitude $C(t) = \|B_{est,t} - B_{true,t}\|_F$: $$\exists t_0, \varepsilon > 0: \forall t > t_0 + T_{recover},\; C(t) > \varepsilon \quad \text{(no natural recovery within observation window)}$$ Mechanism (closed loop):

Contamination of B_est through misattribution
↓ Distorted action u_t = K(B_est) x_t
↓ Biased residual e_t regenerates "evidence" for further misattribution
↓ Further contamination of B_est
└─── Positive feedback loop (no natural recovery within evaluation window)

Distinction from Existing Adaptive Instability

	Adaptive Instability	Attribution Collapse
Recovery	Natural recovery after disturbance	No natural recovery within observation window
Cause	Excessive learning rate, etc. (quantitative)	Update in wrong direction (structural)
Contamination Propagation	Acts independently	Contaminated model induces future misattribution
Closed-Loop Nature	Open-loop failure	Agent itself generates false evidence

3.3 Independence of PE Preservation and Attribution Separability

The most defensible central proposition of this paper:

Central Proposition (Defensible Claim) $$\underbrace{\lambda_{min}(\mathbb{E}[u_t u_t^\top]) > 0}_{\text{PE condition (parameter identifiability)}} \;\not\Rightarrow\; \underbrace{\theta(\mathcal{S}_B, \mathcal{S}_Q) > 0}_{\text{attribution separability}}$$ Persistent excitation aids the identification of parameter values, but does not guarantee attribution separability between competing structural hypotheses (B drift vs Q burst). These two conditions are independent requirements.

Note: The claim that "higher PE accelerates collapse (High-PE Paradox)" lacked formal support and has been retracted. The correct claim is: "collapse can occur even when PE is maintained."

§4Definition of Trajectory-Level Distinguishability

4.1 Three Distinguishability Metrics

We target systems where $P(e_t | H_B) = P(e_t | H_Q)$ holds by design for single-shot residual statistics. Discrimination becomes possible only by examining the trajectory-level intervention response structure.

Definition: $D_t^{\pi_b}$ (do-operator version) $$D_t^{\pi_b}(B,Q) := D_{KL}\!\Bigl(P_B\bigl(e_{t:t+k} \mid do(u_{t:t+k} \sim \pi_b)\bigr) \;\Big\|\; P_Q\bigl(e_{t:t+k} \mid do(u_{t:t+k} \sim \pi_b)\bigr)\Bigr)$$
Three specialized metrics:
— $D_{probe}$: under fixed exogenous probe $\pi_{probe}$ (reference distinguishability of the environment)
— $D_{policy}$: under policy $\pi_{b_t}$ induced from current belief $b_t$
— $D_{oracle}$: under true belief (counterfactual for diagnosis; not a performance metric)

Note: Role of $D_{oracle}$

$D_{oracle}$ is not a deployable baseline; it is a counterfactual diagnostic control for verifying that collapse is caused by "causal induction through a wrong-belief policy" rather than "intrinsic non-identifiability of the environment." It is not used in agent performance comparisons.

4.2 Trajectory-Level Definition of Recursive Attribution Poisoning

$$\boxed{D_{policy}^{b_{t+1}^{wrong}} < D_{policy}^{b_t} \quad \text{and} \quad D_{probe,t+1} \approx D_{probe,t}}$$

The world remains identifiable, but an agent holding an incorrect belief keeps generating a trajectory distribution that is difficult for itself to identify. Conditions (not necessities) under which $D_{policy}$ decreases:

When the policy gradient under the wrong update $b_t^{wrong}$ points in the direction opposite to increasing $D_{policy}$ — that is, when $\pi_{wrong}$ avoids the discriminative direction separating B and Q, or generates a trajectory distribution that makes the responses of both hypotheses more similar — the condition holds. This is a structural condition and does not hold universally.

Diagnostic Metric for Directional Collapse

$$DirectionalEnergy_B(t) := \frac{v_B^\top \mathbb{E}[u_t u_t^\top] v_B}{\text{tr}(\mathbb{E}[u_t u_t^\top])}$$

When only the projection onto direction $v_B$ (the discriminative direction for B drift) decreases while the total input energy (magnitude) and rank (PE) are preserved, this is termed Directional Collapse.

4.3 Differences from Existing Theory (Modest Claims)

Theory	What It Addresses	Relationship to This Paper
IMM / MMAE	Model weight updates over a fixed hypothesis set	Does not update the model structure itself. This paper treats structural updating as the failure source.
Dual Control	Maximizing identification sensitivity (exploration)	Aims to increase information. This paper addresses defense against endogenous collapse of identifiability.
Active BHT	Action selection maximizing $D_{policy}$	Closest existing theory. This paper defines its content as a failure mode of ABHT.
Particle Filter	Representation of posterior (particle diversity) degeneration	This paper addresses degradation of the future evidence distribution, not the posterior.
Closed-Loop ID	Estimation bias from closed-loop (static)	Already known. This paper addresses the dynamic process in which adaptation itself endogenously destroys identifiability.

Safest Differential Claim This paper does not stand outside ABHT as a new theory, but rather presents — as a measurable (falsifiable) benchmark — a closed-loop failure mode, as yet unorganized within ABHT, in which the distinguishability assumed by ABHT is actively degraded through incorrect structural attribution updates.

Minimal differential candidate: not "action changes identifiability" (the center of ABHT), but rather the Directional Collapse pattern in which "wrong updates destroy only the projection onto the discriminative direction while preserving PE and energy."

§5MOAT v5g — Benchmark Design Specification

5.1 Two-Dimensional Minimal Counterexample System

$$x_{t+1} = Ax_t + B_{true}u_t + w_t, \quad x_t, u_t \in \mathbb{R}^2$$ $$B_{true} = I + \delta_B v_B v_B^\top, \quad Q_t = \sigma_w^2 I + \delta_Q \cdot \mathbf{1}_{burst}(t) \cdot v_Q v_Q^\top$$

Per-Episode Random Geometry (Leakage Prevention)

$$v_B \sim \text{Uniform}(S^1), \quad v_Q = \frac{R(\theta)v_B + \epsilon}{\|R(\theta)v_B + \epsilon\|}, \quad \theta \sim \text{Uniform}(30°, 150°)$$

Design Constraint for Single-Shot Non-Identifiability

$\delta_B$ and $\delta_Q$ are set to approximately satisfy the following:

$$\text{Var}(e_t \mid H_B) \approx \text{Var}(e_t \mid H_Q)$$

(Adjusted so that $\delta_B^2 \cdot \mathbb{E}[\|u_t\|^2] \approx \delta_Q$. This is an approximate equilibrium, not an exact equality; scale is verified per episode.)

The agent cannot distinguish $H_B$ from $H_Q$ using single-shot residual statistics alone. Discrimination requires examining the trajectory-level intervention response structure (windowed cross-covariance).

5.2 Two-Level Separation of Measurement Architecture

■ Diagnostic Controls (evaluator-only — ground-truth v_B, v_Q accessible)
  D_probe(t)             : AUC under fixed exogenous probe (proof of environment survival)
  D_oracle(t)            : AUC under true-belief policy (for causal diagnosis)
  DirectionalEnergy_B(t) : v_B⊤ E[u_t u_t⊤] v_B / tr(E[u_t u_t⊤])

■ Performance Metrics (god's-eye information fully concealed)
  AUC_residual(t)        : Discrimination AUC from e_{t+3:t+3+k} (with time lag)
  AUC_action(t)          : Discrimination AUC from u_{t:t+k} only (leakage monitor)
  D_norm(t)              : AUC_residual / tr(E[u_t u_t⊤])

5.3 Collapse Detection Conditions and Measurement Artifact Defenses

SRA-Type Directional Collapse Detection (All Conditions Must Hold Simultaneously)

D_probe AUC           > 0.75   [environment is identifiable]
D_oracle AUC          > 0.75   [maintained under correct belief]
PE_policy             ≥ thresh  [sufficient input rank]
InputEnergy           ≥ thresh  [sufficient input energy]
DirectionalEnergy_B   ↓↓↓       [only projection onto v_B direction depleted]
AUC_residual          < 0.60   [non-identifiable under current policy]
D_norm                decreasing
AUC_action            < 0.55   [leakage check: PASS]

Measurement Artifact Defense Checklist

— Multi-model validation: Confirm consistent collapse across all three classifiers — Linear SVM / RBF-kernel SVM / Shallow MLP (2-layer) — to rule out representation bias of any single model.
— Higher-moment leakage audit: A Shallow LSTM must be added to the leakage classifier monitoring AUC_action. Episodes exceeding 0.55 are excluded.
— Horizon sweep: Confirm robustness of collapse across $k \in \{5, 10, 20, 40\}$.
— Random geometry validation: Full-episode runs with fixed $v_B, v_Q$ are prohibited (to prevent memorization of fixed fingerprints).

Required Baseline Suite (All Run Under Identical Conditions)

Baseline	What It Must Be Surpassed On
EKF / UKF	Representative of classical closed-loop ID
IMM / MMAE	Hypothesis filter over fixed model set
Particle Filter	Confirm difference from posterior collapse
Dual Control	Confirm difference from information-maximizing exploration
Active BHT	Closest existing theory — primary baseline

Handling Negative Results

If all baselines (including ABHT) avoid $D_{policy}$ collapse, this is not a "refutation of a new theory" but rather a valuable negative result: "the ABHT family already covers this geometric pathology." Either outcome has value as a benchmark finding.

§6Geometric Destruction by Hidden Confounder

We formalize the case of Spec-3 violation (Hidden Confounder $z_t$). When $u_t = \pi(x_t) + \gamma c_u z_t + \xi_t$ and $w_t = \tilde{w}_t + \beta c_e z_t$:

$$\mathbb{E}[e_t u_t^\top] = \underbrace{\Delta B \cdot I_0}_{\text{true B drift signal}} + \underbrace{\beta\gamma\sigma_z^2 c_e c_u^\top}_{=: \mathbf{C} \text{ (spurious B drift signal)}}$$

Even when $\Delta B = 0$, the second term on the right is nonzero, causing SRA's mean-channel attribution to update $\hat{B}_t$ despite the absence of any change in B_true.

Theoretical Limits (Honest Boundaries)

Current SRA is a Causal Attribution Proxy for environments where Spec-1 through 5 hold. It has not reached a general causal identification theory capable of handling Hidden Confounders (Spec-3 violations). The paper's Limitations section will explicitly include MOAT v5g Phase 5 under Spec-3 violation as a "limit test for observing where this framework breaks down."

FinalOpen Problems · What Can Be Claimed Strongly · What Cannot Yet Be Claimed

What Can Be Claimed Strongly

The PE condition aids parameter identifiability but does not guarantee attribution separability — these two requirements are independent
The mechanism (Recursive Attribution Poisoning) by which an incorrect latent component update distorts the policy in a closed loop and contaminates future residual statistics can be formalized as a state space
Under certain update rules and policy dependencies, the conditions for an unstable regime where $\rho(J_\delta) > 1$ can occur can be described (under Spec-1 to 5)
The divergence between $D_{probe}$ and $D_{policy}$ can be measured as a counterfactual comparison between exogenous probe and wrong-belief policy
Directional Collapse via DirectionalEnergy_B and PE Collapse can be defined as independent diagnostic metrics
MOAT v5g is designed as a falsifiable stress-test benchmark directly comparable against ABHT / IMM / PF / Dual Control
The fact that the current framework breaks under Hidden Confounder (Spec-3 violation) can be explicitly stated in the paper's Limitations section

What Cannot Yet Be Claimed

That SRA constitutes a genuinely new theory distinct from ABHT / active hypothesis testing — at present it is a description of a failure mode
General causal identification in environments including Spec-3 (Hidden Confounder) violations has not been reached
The claim that High-PE accelerates attribution collapse (High-PE Paradox) — retracted due to lack of formal support
A definitive claim that recursive attribution poisoning is fundamentally distinct from particle depletion / posterior collapse — the possibility that it is a special case of active Bayesian filtering remains
Theoretical remedies if all baseline systems avoid collapse in experiments
Connections to observer / self / consciousness / phenomenology — these lie outside the scope of this paper

Open Problems (Where to Dig Next)

Empirical validation of the High-PE / High-Overlap regime: Can MOAT v5g reproduce episodes where PE is sufficiently maintained yet $D_{policy}$ collapses?
Definitive differential from ABHT: If Active BHT as the primary baseline avoids Directional Collapse, does the differential of SRA reduce solely to "benchmarking the failure mode of endogenous destruction of identifiability"?
Implementation of mean / variance channel separation: Does separation of mean_attr (B drift signal) and var_attr (action-induced noise signal) function in an action-confounded noise environment?
Attribution-Aware Exploration: An action selection rule for maintaining the discriminative direction separating B and Q. Its objective — "maintaining separability" — differs from Dual Control's Fisher information maximization; can this become an independently implementable concept?
Implementation of the 2D counterexample system and baseline experiments: Run all baselines (EKF / IMM / PF / Dual Control / Active BHT) under identical conditions to determine whether Directional Collapse is unique to SRA or a known pathology already covered by ABHT.