03 MethodCollatz finite-block diagnostics

States, scores, and the four tests

All four experiments share the same conditioning scheme and the same block score, defined in collatz_block_anomaly_score.py and reused throughout the subsequent analyses. Defining them once is what makes the tests comparable.

3.1 Conditioning cells (states)

Each word is assigned to a state

\[ \text{state} \;=\; \text{bridge\_cluster} \;\big|\; \text{x\_K\_window} \;\big|\; \text{parity}, \]

built from three components:

bridge_cluster — a path-shape label. Let \(x(u)\) be the log-magnitude partial-sum path reparameterized to \(u \in [0,1]\); the feature \(z_{25} = x(0.25) - 0.25\,x(1)\) measures how far the quarter-point sits above or below the straight chord. Tertile cuts on \(z_{25}\) (at \(q_{\text{low}}=-1.5\), \(q_{\text{high}}=-0.25\)) give late_growth, balanced, early_growth.
x_K_window — a window of \(x_K = K_\tau - (\text{power} - h)\), bucketed as exhaustion_0_31, deep_32_63, tail_64_95; words outside these windows are dropped.
parity — the parity of power (even/odd).

One state is used throughout as a stress test:

\[ \textbf{focus state} \;=\; \texttt{late\_growth | tail\_64\_95 | even}, \]

a deep-tail, late-growth, even state where the actual–iid discrepancy is largest and sampling is sparsest.

3.2 Block log-ratio score

On the train split, for each \((L, \text{state}, u\text{-bin}, \text{block})\) we estimate the smoothed block probabilities for the actual and iid samples and compute and store a log-ratio,

\[ s_L(\text{state}, u, \text{block}) \;=\; \log_2 \frac{p^{\text{actual}}(\text{block})}{p^{\text{iid}}(\text{block})}, \]

where \(u\) is a decile bin of the within-word position. Smoothing is add-\(\alpha\) (\(\alpha = 0.5\)) over the \(3^L\) blocks in the anomaly test, and a uniform mixture \((1-\lambda)\,\hat p + \lambda/3^L\) with \(\lambda = 0.02\) in the renormalization and projection tests. A state is considered stable only if its train iid mass exceeds \(10^{-7}\).

A word's block score at length \(L\) sums the stored log-ratios over its sliding windows on the test split:

\[ S_L(\mathbf{k}) \;=\; \sum_{i} s_L\!\big(\text{state}, \, u\text{-bin}(i,\tau), \, \text{block}_i\big). \]

The train/test split is determined by sample-index parity, so no word is scored by probabilities estimated from itself.

3.3 The four tests

Two tests are diagnostic (does the score separate actual from iid?) and two are generative (can the score reshape the iid measure into the actual one?).

Table 3.1 — The four experiments and what each asks.
#	Test	Type	Question
1	Block anomaly (B3/B4)	diagnostic	Does \(S_4\) improve separation over a baseline of \(x_K\), parity, bridge, path-shape?
2	Block-length renormalization	diagnostic	Does the separation grow with \(L=3,\dots,6\); does the residual shrink?
3	Finite-block reweighting	generative	Does reweighting iid by \(2^{\alpha S_L}\) reproduce the actual mass?
4	MaxEnt block projection	generative	Does a regularized IPF matching block marginals beat raw/damped reweighting?

Test 1 — block anomaly score

Fit a weighted logistic classifier (actual vs iid) on baseline covariates, then add the block score and assess the increase in weighted AUC, while checking whether the bridge and parity coefficients are absorbed.

Test 2 — block-length renormalization

Repeat the scoring procedure for \(L=3,\dots,6\) and track the marginal-score AUC, the \(+\)score logistic AUC, and the bridge/parity residual coefficients as functions of \(L\). Also track per-decile survival ratios in the focus state.

Test 3 — finite-block reweighting

Reweight each iid test word by \(2^{\alpha S_L}\) for \(\alpha \in \{0, 0.25, 0.5, 0.75, 1\}\), rescale the predicted mass so that its total matches the actual total over stable states, and compare predicted vs actual state-mass distributions by RMSE and Jensen–Shannon divergence.

Test 4 — maximum-entropy block projection

Run an approximate regularized IPF (exponential-family) update on the iid test measure to match block marginals, for \(L=3,\dots,6\) and regularization \(\{0, 0.5, 0.75, 0.9\}\) (two iterations, evaluation capped at \(40{,}000\) iid words). Compare the projected state distribution to actual, and to the best raw/damped reweighting fit from Test 3.

Scope of the projection Test 4 is an approximate projection, not an exact IPF solution over all words. It is built to distinguish an over-counted raw product from a regularized finite-block exponential family — rather than to certify a maximum-entropy fit.

3.4 Reported quantities

weighted AUC — separation of actual from iid by score (diagnostic tests).
RMSE / JS — predicted vs actual state-mass distribution (generative tests).
survival — reported in two explicitly separated forms: a per-decile survival ratio \(S = (\text{actual mass})/(\text{iid mass})\) in Tests 1–2, and focus-state retained mass, modelled vs actual, in Tests 3–4.
bridge / parity residual — magnitude of the bridge-cluster and parity coefficients that remain after the block score is included.

Auxiliary Δ. In §4.6 we additionally report, as a descriptive statistic, the state-mass difference

\[ \Delta(\text{state}) \;=\; \mu_{\text{actual}}(\text{state}) \;-\; \mu_{\text{iid}}(\text{state}), \]

projected onto state, prefix cylinder, transition, and boundary / remaining_K coordinates. Δ is a descriptive statistic for the actual–iid mass difference, not a generative model; it is used only to identify where the discrepancy detected in §4.1–4.4 is localized.