03  MethodCollatz finite-block diagnostics

States, scores, and the four tests

All four experiments share the same conditioning scheme and the same block score, defined in collatz_block_anomaly_score.py and reused throughout the subsequent analyses. Defining them once is what makes the tests comparable.

3.1 Conditioning cells (states)

Each word is assigned to a state

\[ \text{state} \;=\; \text{bridge\_cluster} \;\big|\; \text{x\_K\_window} \;\big|\; \text{parity}, \]

built from three components:

One state is used throughout as a stress test:

\[ \textbf{focus state} \;=\; \texttt{late\_growth | tail\_64\_95 | even}, \]

a deep-tail, late-growth, even state where the actual–iid discrepancy is largest and sampling is sparsest.

3.2 Block log-ratio score

On the train split, for each \((L, \text{state}, u\text{-bin}, \text{block})\) we estimate the smoothed block probabilities for the actual and iid samples and compute and store a log-ratio,

\[ s_L(\text{state}, u, \text{block}) \;=\; \log_2 \frac{p^{\text{actual}}(\text{block})}{p^{\text{iid}}(\text{block})}, \]

where \(u\) is a decile bin of the within-word position. Smoothing is add-\(\alpha\) (\(\alpha = 0.5\)) over the \(3^L\) blocks in the anomaly test, and a uniform mixture \((1-\lambda)\,\hat p + \lambda/3^L\) with \(\lambda = 0.02\) in the renormalization and projection tests. A state is considered stable only if its train iid mass exceeds \(10^{-7}\).

A word's block score at length \(L\) sums the stored log-ratios over its sliding windows on the test split:

\[ S_L(\mathbf{k}) \;=\; \sum_{i} s_L\!\big(\text{state}, \, u\text{-bin}(i,\tau), \, \text{block}_i\big). \]

The train/test split is determined by sample-index parity, so no word is scored by probabilities estimated from itself.

3.3 The four tests

Two tests are diagnostic (does the score separate actual from iid?) and two are generative (can the score reshape the iid measure into the actual one?).

Table 3.1 — The four experiments and what each asks.
#TestTypeQuestion
1Block anomaly (B3/B4)diagnosticDoes \(S_4\) improve separation over a baseline of \(x_K\), parity, bridge, path-shape?
2Block-length renormalizationdiagnosticDoes the separation grow with \(L=3,\dots,6\); does the residual shrink?
3Finite-block reweightinggenerativeDoes reweighting iid by \(2^{\alpha S_L}\) reproduce the actual mass?
4MaxEnt block projectiongenerativeDoes a regularized IPF matching block marginals beat raw/damped reweighting?

Test 1 — block anomaly score

Fit a weighted logistic classifier (actual vs iid) on baseline covariates, then add the block score and assess the increase in weighted AUC, while checking whether the bridge and parity coefficients are absorbed.

Test 2 — block-length renormalization

Repeat the scoring procedure for \(L=3,\dots,6\) and track the marginal-score AUC, the \(+\)score logistic AUC, and the bridge/parity residual coefficients as functions of \(L\). Also track per-decile survival ratios in the focus state.

Test 3 — finite-block reweighting

Reweight each iid test word by \(2^{\alpha S_L}\) for \(\alpha \in \{0, 0.25, 0.5, 0.75, 1\}\), rescale the predicted mass so that its total matches the actual total over stable states, and compare predicted vs actual state-mass distributions by RMSE and Jensen–Shannon divergence.

Test 4 — maximum-entropy block projection

Run an approximate regularized IPF (exponential-family) update on the iid test measure to match block marginals, for \(L=3,\dots,6\) and regularization \(\{0, 0.5, 0.75, 0.9\}\) (two iterations, evaluation capped at \(40{,}000\) iid words). Compare the projected state distribution to actual, and to the best raw/damped reweighting fit from Test 3.

Scope of the projection Test 4 is an approximate projection, not an exact IPF solution over all words. It is built to distinguish an over-counted raw product from a regularized finite-block exponential family — rather than to certify a maximum-entropy fit.

3.4 Reported quantities

Auxiliary Δ. In §4.6 we additionally report, as a descriptive statistic, the state-mass difference

\[ \Delta(\text{state}) \;=\; \mu_{\text{actual}}(\text{state}) \;-\; \mu_{\text{iid}}(\text{state}), \]

projected onto state, prefix cylinder, transition, and boundary / remaining_K coordinates. Δ is a descriptive statistic for the actual–iid mass difference, not a generative model; it is used only to identify where the discrepancy detected in §4.1–4.4 is localized.