States, scores, and the four tests
All four experiments share the same conditioning scheme and the same block
score, defined in collatz_block_anomaly_score.py and reused
throughout the subsequent analyses. Defining them once is what makes the tests comparable.
3.1 Conditioning cells (states)
Each word is assigned to a state
\[ \text{state} \;=\; \text{bridge\_cluster} \;\big|\; \text{x\_K\_window} \;\big|\; \text{parity}, \]built from three components:
- bridge_cluster — a path-shape label. Let \(x(u)\) be the
log-magnitude partial-sum path reparameterized to \(u \in [0,1]\); the feature
\(z_{25} = x(0.25) - 0.25\,x(1)\) measures how far the quarter-point sits above or
below the straight chord. Tertile cuts on \(z_{25}\) (at \(q_{\text{low}}=-1.5\),
\(q_{\text{high}}=-0.25\)) give
late_growth,balanced,early_growth. - x_K_window — a window of
\(x_K = K_\tau - (\text{power} - h)\), bucketed as
exhaustion_0_31,deep_32_63,tail_64_95; words outside these windows are dropped. - parity — the parity of
power(even/odd).
One state is used throughout as a stress test:
\[ \textbf{focus state} \;=\; \texttt{late\_growth | tail\_64\_95 | even}, \]a deep-tail, late-growth, even state where the actual–iid discrepancy is largest and sampling is sparsest.
3.2 Block log-ratio score
On the train split, for each \((L, \text{state}, u\text{-bin}, \text{block})\) we estimate the smoothed block probabilities for the actual and iid samples and compute and store a log-ratio,
\[ s_L(\text{state}, u, \text{block}) \;=\; \log_2 \frac{p^{\text{actual}}(\text{block})}{p^{\text{iid}}(\text{block})}, \]where \(u\) is a decile bin of the within-word position. Smoothing is add-\(\alpha\) (\(\alpha = 0.5\)) over the \(3^L\) blocks in the anomaly test, and a uniform mixture \((1-\lambda)\,\hat p + \lambda/3^L\) with \(\lambda = 0.02\) in the renormalization and projection tests. A state is considered stable only if its train iid mass exceeds \(10^{-7}\).
A word's block score at length \(L\) sums the stored log-ratios over its sliding windows on the test split:
\[ S_L(\mathbf{k}) \;=\; \sum_{i} s_L\!\big(\text{state}, \, u\text{-bin}(i,\tau), \, \text{block}_i\big). \]The train/test split is determined by sample-index parity, so no word is scored by probabilities estimated from itself.
3.3 The four tests
Two tests are diagnostic (does the score separate actual from iid?) and two are generative (can the score reshape the iid measure into the actual one?).
| # | Test | Type | Question |
|---|---|---|---|
| 1 | Block anomaly (B3/B4) | diagnostic | Does \(S_4\) improve separation over a baseline of \(x_K\), parity, bridge, path-shape? |
| 2 | Block-length renormalization | diagnostic | Does the separation grow with \(L=3,\dots,6\); does the residual shrink? |
| 3 | Finite-block reweighting | generative | Does reweighting iid by \(2^{\alpha S_L}\) reproduce the actual mass? |
| 4 | MaxEnt block projection | generative | Does a regularized IPF matching block marginals beat raw/damped reweighting? |
Test 1 — block anomaly score
Fit a weighted logistic classifier (actual vs iid) on baseline covariates, then add the block score and assess the increase in weighted AUC, while checking whether the bridge and parity coefficients are absorbed.
Test 2 — block-length renormalization
Repeat the scoring procedure for \(L=3,\dots,6\) and track the marginal-score AUC, the \(+\)score logistic AUC, and the bridge/parity residual coefficients as functions of \(L\). Also track per-decile survival ratios in the focus state.
Test 3 — finite-block reweighting
Reweight each iid test word by \(2^{\alpha S_L}\) for \(\alpha \in \{0, 0.25, 0.5, 0.75, 1\}\), rescale the predicted mass so that its total matches the actual total over stable states, and compare predicted vs actual state-mass distributions by RMSE and Jensen–Shannon divergence.
Test 4 — maximum-entropy block projection
Run an approximate regularized IPF (exponential-family) update on the iid test measure to match block marginals, for \(L=3,\dots,6\) and regularization \(\{0, 0.5, 0.75, 0.9\}\) (two iterations, evaluation capped at \(40{,}000\) iid words). Compare the projected state distribution to actual, and to the best raw/damped reweighting fit from Test 3.
3.4 Reported quantities
- weighted AUC — separation of actual from iid by score (diagnostic tests).
- RMSE / JS — predicted vs actual state-mass distribution (generative tests).
- survival — reported in two explicitly separated forms: a per-decile survival ratio \(S = (\text{actual mass})/(\text{iid mass})\) in Tests 1–2, and focus-state retained mass, modelled vs actual, in Tests 3–4.
- bridge / parity residual — magnitude of the bridge-cluster and parity coefficients that remain after the block score is included.
Auxiliary Δ. In §4.6 we additionally report, as a descriptive statistic, the state-mass difference
\[ \Delta(\text{state}) \;=\; \mu_{\text{actual}}(\text{state}) \;-\; \mu_{\text{iid}}(\text{state}), \]projected onto state, prefix cylinder, transition, and boundary / remaining_K
coordinates. Δ is a descriptive statistic for the actual–iid mass difference, not a
generative model; it is used only to identify where the discrepancy detected
in §4.1–4.4 is localized.