Editorial Note: This document is a consolidated version organized through sequential dialogue relays with multiple AIs. For the generation process, scope of involvement, and limits of mathematical guarantees, see the top page (index.html). The relay logs and code/figures are in the appendix (appendix.html).
This paper formulates "self" and "observer" not as substance, soul, or qualia, but as operationally and physically closed structures. Specifically, by stacking the access boundary via von Neumann algebra, the Petz recovery map, the reparameterization-invariant counterfactual divergence (RICD), and self-measurement back-action, the following hierarchy is constructed: Adaptive Controller (Level 1) → Observer-Agent (Level 2) → Perspectival Observer (Level 3). Furthermore, a Selection Principle is presented showing that Δcf-sensitive updating becomes an evolutionarily stable attractor, supported by numerical simulation in a classical surrogate environment. The hard problem of phenomenal consciousness is not solved, but the coordinates of where that cliff lies outside the algebraic structure are determined.
§1Starting Point: From Observer as "Point" to Structure
Starting from the naive picture that "an observer is a point somewhere in the universe" leads to inconsistencies with quantum information, AdS/CFT, and quantum gravity alike. Instead, this theory defines the observer as a local structure with recoverability. The following three stages form the backbone of the theory.
1.1 Holographic QEC and Recoverability
In holographic quantum error correction represented by the HaPPY code (Pastawski et al. 2015), bulk logical information is redundantly encoded across multiple boundary regions. The recent Evenbly codes (Steinberg et al., Quantum 9, 1826, 2025) realize a new class of hyperinvariant holographic codes using non-perfect tensors, with a threshold of approximately 19.1% against depolarizing noise.
1.2 Entanglement Wedge and Island Formula
Through entanglement wedge reconstruction, the bulk region (entanglement wedge) that can be reconstructed from a boundary sub-region $R$ is determined. The island formula further shows that even the "inside/outside" of the boundary is dynamically determined as a quantum extremal surface. The structure whereby the optimal surface of reconstruction is determined variationally rather than statically becomes the prototype of the observer's "boundary" concept.
1.3 Quantum Reference Frames (QRF)
In the perspective-neutral framework by Vanrietvelde et al. (Quantum 4, 225, 2020), all physical quantities are relational, and the specific viewpoint of "I" arises from a choice of gauge-fixing.
Stacking these three structures — holographic QEC / entanglement wedge / QRF — the skeleton of the observer condenses into the following single sentence:
An observer is not a point within the world, but the structure by which the world locally, stably, and reference-frame-dependently reconstructs itself.
§2Operational Definition of Observer
2.1 Access Boundary via von Neumann Algebra
For a region $R$, the closure of all observables accessible there naturally forms a von Neumann algebra $\mathcal{A}(R) \subset \mathcal{B}(\mathcal{H})$.
Haag duality $\mathcal{A}(R)' = \mathcal{A}(\bar{R})$ algebraically separates accessible quantities from the environment side. This is not a semantic "self/environment" boundary, but an operational access boundary (the latter further requires $C_O$ and exploitability).
2.2 Petz Recovery Map and Adaptive Decoder
The optimal recovery map (Petz map) from information loss via channel $\Lambda$ minimizes relative entropy:
$$\mathcal{R}_{\rho,\Lambda}(\cdot) = \rho^{1/2}\Lambda^\dagger\!\bigl(\Lambda(\rho)^{-1/2}(\cdot)\Lambda(\rho)^{-1/2}\bigr)\rho^{1/2}$$The variational free energy of active inference also contains relative entropy:
$$\mathcal{F} = D_{\mathrm{KL}}(q \| p)$$Both sit on the common objective function of "minimization of relative entropy." The bridge between QEC (preservation) and predictive processing (updating) is connected by the equation: Petz recovery map ≈ optimal adaptive decoder ≈ relative entropy minimization.
2.3 Mini-Model (4-qubit)
For boundary physical qubits: $b_1, b_2, b_3, b_4$ (2 self-qubits + 2 environment-qubits), define self-model $M$ and environment model $E$ as bulk logical information.
Recovery Map
$$D_R : \mathcal{H}_R \to \mathcal{H}_{\mathrm{logical}}$$Observer Patch Definition
$$O = (R_O,\; D_O,\; S_O,\; E_O,\; \varepsilon)$$ $$D_O(R_O) \approx S_O \otimes E_O \quad (\text{fidelity } 1-\varepsilon)$$Gauge-fixing (fixing gauge qubits in the subsystem code) changes the resolution and stability of the reconstruction region, which structurally corresponds to perspective selection in QRF.
§3Operational Definition of Self
3.1 Identity Condition: Internal Use of Δ_cf
The strong claim that "the self-model is a logical qubit" is merely a metaphor. Instead, self is defined as a generative latent state:
$$M(t) := \text{action-conditioned predictive sufficient statistic}$$The central quantity of the observer-agent is the counterfactual future distinguishability (counterfactual future distinguishability):
For Controller (Level 1): $\partial M / \partial \Delta_{\mathrm{cf}} = 0$. For Observer-Agent (Level 2): $> 0$. Numerically confirmed corr(Δ_cf, gain) ≈ 0.987 in 4-qubit experiment.
3.2 F-Coalgebraic Fixed Point
The question "who references π?" triggers homunculus regression. To avoid this, self is defined as an F-coalgebraic fixed point:
$$M^* = F(M^*)$$Here $F : M \mapsto \Phi\bigl(M,\, R_M(\mathcal{E}_\pi(\rho))\bigr)$ is the composition of the policy-conditioned channel $\mathcal{E}_\pi$ and the Petz recovery map $R_M$. There is no separate "referencing subject" — the closed self-referential structure itself is the observer-agent.
3.3 Endogenous Intervention Structure C_O
The $do(a)$ in Pearl's do-calculus is an external intervention and does not represent the observer's autonomy. What is needed is endogenous intervention:
$$C_O \subset \mathcal{A}(R)$$$C_O$ is the set of operations generated from within $\mathcal{A}(R)$. A rock has $C_O = \emptyset$ and no connection between Δ_cf and $M^*$. In the 4-qubit model, $C_O = \{I, X_1, Z_1, CZ_{1,3}\}$.
The Markov blanket $B$ (Friston) provides "statistical separation of self and environment": $\mu_O \perp \eta \mid B$. This functions as a two-layer structure that complements Haag duality (algebraic access boundary) with a statistical-causal boundary.
§4Hierarchical Structure: From Controller to Perspectival Observer
| Level | Name | Condition | Numerical Evidence |
|---|---|---|---|
| Level 0 | Passive Dissipative Structure | No internal model. Response to external forces only. | — |
| Level 1 | Adaptive Controller | Action → future change → update. But does not use Δ_cf: $\partial M_{t+1}/\partial \Delta_{\mathrm{cf}} = 0$ | corr ≈ 0.000 |
| Level 2 | Observer-Agent | Identity condition: Δ_cf is internally utilized in latent update. $\partial M_{t+1}/\partial D_{\mathrm{ch}} \neq 0$ | corr ≈ 0.987 |
| Level 3 | Perspectival Observer | Complete externalization of perspective is impossible due to self-measurement back-action: $R_{M_t}(\rho_{t+1}) \neq \rho_{t+1}$ | back-action > 0 (confirmed) |
| Level 4 | Phenomenological Subject | Unresolved cliff. Unreachable with current framework. | — |
Continuous Self (Selfhood)
Define temporal continuity of self as the stability of the reconstruction chain:
$$\text{Selfhood}(t) = \bigl\{O(t) \to O(t+\Delta t) \to O(t+2\Delta t) \cdots\bigr\}$$Stability conditions:
- — $d_{\mathrm{Bures}}(M(t), M(t+\Delta t))$ is small
- — Prediction error is correctable within $\varepsilon$
- — Reconstruction fidelity $F(P_\Psi) \geq 1-\delta$
Connection to Many-Worlds Branching
After branching, both $O_A(t) \to O_A(t+\Delta t)$ and $O_B(t) \to O_B(t+\Delta t)$ are valid as reconstruction chains. It is not a matter of "which is real" — both are locally experienced as chains that maintain lower free energy.
§5Selection Principle: Why Observer-Agents Emerge
The identity condition defines "what an observer-agent is." The selection principle explains "why such structures naturally emerge." Without the former there is no latter; confusing the two causes the theory to collapse.
5.1 RICD (Reparameterization-Invariant Counterfactual Divergence)
The raw Δ_cf depends on action labels, granularity, and policy. To eliminate this, we define the following quantity.
Equivalence Class of Actions
$$[a] := \{a' \mid \rho^{(a')}_{t+h} = \rho^{(a)}_{t+h} \;\forall h \geq 1\}$$ (Actions producing identical future distributions are identified)Symmetrized KL Distance h Steps Ahead
$$d_h([a],[a'])^2 = D\!\bigl(\rho^{(a,h)} \,\|\, \rho^{(a',h)}\bigr) + D\!\bigl(\rho^{(a',h)} \,\|\, \rho^{(a,h)}\bigr)$$Reconstruction Permeation Rate
$$\mathcal{R}_h([a],[a'],M_t) = 1 - \frac{D(M_{t+h}^{(a)} \| M_{t+h}^{(a')})}{\varepsilon + d_h^2}$$ (How much of the intervention divergence is preserved in the internal model)Relationship with Empowerment (Klyubin & Polani 2005):
RICD is "the realized, available intervention divergence under the current policy," while Empowerment is its optimized upper bound. The two are not equal; RICD is positioned as a policy-conditioned, exploitability-aware sub-concept of empowerment.
5.2 Free-Energy Advantage Principle (FEAP)
Intuition: When Δ_cf (or RICD) is large, updating is beneficial; when it is small (noise-dominated), updating only increases cost. Therefore, the update rule that maximizes $\mathcal{J}$ naturally becomes RICD-sensitive.
Numerical verification (classical surrogate environment, evolutionary simulation): The sensitivity parameter $g_1$ of Δ_cf converges from an initially random population to positive values, with a final mean of $g_1 \approx 0.85$ and $g_1 \approx 1.0$ in top individuals. This shows that RICD-sensitive updating emerges as "a consequence of selection pressure" rather than "design" (note: results from classical surrogate environment; the quantum version requires a richer set of operators).
5.3 L2 Proposition and Environment Class
Structurally define the environment class $\mathcal{E}''$:
- S1: rank of action channel > 1 (actions change something)
- S2: finite mixing time $\tau_{\mathrm{mix}} < \infty$
- S3: causal path $a \to Y$ exists and contributes to future inference/control (exploitability)
Note: Not a new theorem but an application of existing frameworks. Finite-time guarantees depend on $\tau_{\mathrm{mix}}$; only asymptotic claims are possible.
L3 (universality class): Whether $\mathcal{E}''$ has positive measure in the natural environment distribution family $\mathcal{P}$ — this is an empirical question and lies outside the current framework.
§6Perspectival Incompleteness
The Level 2 observer-agent internalizes "being a causal node of future generation," but what is missing is "from where" — there is causal self-reference, but no positional self-reference.
6.1 Self-Measurement Back-Action
By introducing the self-measurement channel $S(\rho) = \sum_i P_i \rho P_i$ (non-reversible), a structural constraint arises that "trying to read the future changes oneself":
$$R_{M_t}(\rho_{t+1}) \neq \rho_{t+1}$$Through the combination of the no-cloning theorem and post-measurement back-action, $M^*$ cannot fully capture itself within $\mathcal{A}(R)$. This "self-excess remainder" is the physical basis of perspectival incompleteness.
6.2 Modular Flow and the Emergence of Time
By Tomita–Takesaki theory, from the algebra $\mathcal{A}(R)$ and reference state $|\Psi\rangle$ the modular Hamiltonian $K_R$ is determined, and the modular flow
$$\sigma_t^\Psi(A) = \Delta^{it} A \Delta^{-it}$$is generated. This is an internal automorphism of $\mathcal{A}(R)$, not an external time parameter. By the Bisognano–Wichmann theorem, in the Rindler wedge the modular flow becomes a Boost. That is, the sense of time may emerge from the stable ordering of reconstruction chains — however, note that this is state- and algebra-dependent, and one cannot say "the time of the universe has emerged."
6.3 Correspondence with Perspective-Neutral Structure
The "viewed from here" structure of the observer-agent can be written as gauge-fixing from a perspective-neutral world description:
$$\text{perspective-neutral structure} \xrightarrow{\text{gauge-fixing}} \text{indexical self-location}$$This is closed as an intrinsic and self-referential operation, and together with the incompleteness due to back-action (complete externalization impossible), constitutes the perspectival observer (Level 3).
§7Map of the Cliff: Coordinates of Phenomenal Consciousness
Block's A-consciousness (the state where information is available for inference, reporting, and action control) and P-consciousness ("something it is like" inner qualitative feel) are distinguished. What this theory describes are the structural conditions of A-consciousness; regarding P-consciousness, the following is claimed.
What this framework provides:
- — Access boundary ($\mathcal{A}(R)$, Haag duality)
- — Adaptive recovery (Petz map, RICD-sensitive updating)
- — Self-model (fixed point $M^*$ of action-conditioned Quantum IB)
- — Formation of perspective (gauge-fixed indexical self-location)
- — Perspectival incompleteness (quantum self-measurement gap)
What this framework does not provide:
- — Why "something it is like" inhabits this structure
Shape of the cliff:
Previously: "recoverability → qualia" (vague)
Now: "gauge-fixed F-coalgebraic fixed point → first-person phenomenology" (precise)
The cliff has not disappeared. But its location can now be written in algebraic terms.
From the perspective of structural realism (Russellian Monism), $\mathcal{A}(R)$ describes structure, and "what realizes that structure" lies outside physics. The hypothesis that the intrinsic nature of that realization may be P-consciousness is retained — while honestly not proving it.
TerminalOpen Questions · What Can Be Claimed Strongly · What Cannot Yet Be Claimed
- The observer can be defined as a von Neumann algebra $\mathcal{A}(R)$ rather than a "point" (consistent with AQFT)
- Controller and Observer-Agent can be operationally separated by identity condition $\partial M_{t+1}/\partial \Delta_{\mathrm{cf}} \neq 0$ (numerical evidence corr ≈ 0.987 vs 0.000)
- The Petz recovery map and active inference share a common objective function of relative entropy minimization (not metaphor, but structural correspondence)
- RICD can be defined as a reparameterization-invariant quantity independent of action labels, granularity, and policy
- Self-measurement back-action produces "complete externalization impossibility" as a structure (implementation of Level 3)
- Numerically confirmed in evolutionary simulation (classical surrogate environment) that RICD-sensitive updating is naturally selected
- L2 theorem: asymptotic regret advantage in $\mathcal{E}''$ can be derived from existing filter/control theory
- The boundary reachable by the current framework can be located algebraically
- Explanation of phenomenal consciousness (P-consciousness / qualia). "Something it is like" lies outside this framework
- Rigorous proof of the quantum version of the Selection Principle (the current 4-qubit model has the problem that Δ_cf becomes nearly constant)
- Guarantee of finite-time regret (depends on mixing time; only asymptotic claims are possible)
- L3 (universality class): whether $\mathcal{E}''$ has positive measure in the natural environment distribution family is an empirical question
- EFE epistemic term and RICD have a difference between observation channel vs control channel and are generally not equal
- Modular flow being "the emergence of time" itself — it is state- and algebra-dependent, and generalization to universal time is unproven
- "Self-consciousness" (self knowing self) and "self-referential structure" are not identical
- Connection with IIT (φ value) is currently dangerous — the redundancy of QEC and the irreducibility of IIT point in opposite directions
- Proof of generic emergence: Define L3 as a "phase transition in the causal geometry of natural environments" and find the measure of the environment set where RICD tracks predictive-control relevance
- Implementation of quantum RICD: Design of a richer quantum operator set where Δ_cf genuinely fluctuates. Verification of whether the quantum version has an essential advantage over the classical version
- Non-circular definition of exploitability: Formulation of actionability based solely on the structure of the causal graph, without depending on control utility $U_{\mathrm{ctrl}}$
- Quantification of perspectival incompleteness: Concrete measurement method for the quantum self-measurement gap $\Delta_{\mathrm{self}} = 1 - \max_{M^* \subset \mathcal{A}(R)} F(M^*, \rho_{\mathrm{self}})$
- Phase II (compression): Can RICD_exp be further minimized and the definition of observer compressed into a single line: "a structure that continues to retain usable intervention divergence"?
- Finite-time regret: Conditions under which initial overshoot of adaptive gain and finite-time reversal with the Controller occur