02  BackgroundCollatz finite-block diagnostics

Background and notation

This section fixes the objects we compare: escape words, the iid 2-adic reference, and the coordinates we condition on. Nothing here is novel; it is recorded so the measurements in §4 are unambiguous.

2.1 Escape words

Work with the accelerated odd-to-odd map. For an odd \(n\), one step removes the factor of two from \(3n+1\):

\[ n \;\longmapsto\; \frac{3n+1}{2^{\,k}}, \qquad k = v_2(3n+1) \ge 1 . \]

Following a trajectory until it escapes the layer produces a finite sequence of valuations

\[ \mathbf{k} = (k_1, k_2, \dots, k_\tau), \qquad k_i \ge 1, \]

which we call the escape word; \(\tau\) is its length. The cumulative valuation is \(K_\tau = \sum_i k_i\). In log-magnitude terms each step contributes \(\log_2 3 - k_i\), so the partial sums of \(\log_2 3 - k_i\) describe the trajectory's descent; this is the path we summarize by shape features below.

2.2 The two measures being compared

We compare two distributions over escape words inside each conditioning cell:

Framing Throughout this paper, the object of study is the discrepancy between actual and iid inside a state, not either measure on its own. Every reported quantity (AUC, RMSE, survival, residual) is a comparison.

2.3 Valuation categories and blocks

Raw valuations are bucketed into three categories,

\[ \text{k\_cat}(k) = \begin{cases} \texttt{"1"} & k = 1 \\ \texttt{"2"} & k = 2 \\ \texttt{"3+"} & k \ge 3, \end{cases} \]

and a block of length \(L\) is a window of \(L\) consecutive categories. There are \(3^L\) possible blocks; we use \(L \in \{3,4,5,6\}\). The coarse bucketing keeps the per-block support small enough to estimate from samples while still resolving the short-range structure that the one-step view misses.

2.4 Why a finite-block view at all

Previous analyses suggest that the discrepancy is not adequately captured by any single low-order summary: not by \(K_\tau\), not by \(\tau\), not by mean valuation or cumulative drift, and that one-step transitions are close to iid. In the regression baseline used here those covariates together separate actual from iid only marginally (AUC \(\approx 0.50\); see §4). This naturally leads to the finite-block question: if the deviation is not one-step and is not adequately captured by these scalars, does it live in short multi-step patterns, and if so can those patterns be turned into a generator?

The remainder of the paper answers: the deviation does show up in short blocks (diagnostically), but turning the blocks into a generator fails in the two ways we tried.