01  IntroductionCollatz finite-block diagnostics

Finite-block Diagnostics for Collatz Escape Words Relative to an iid 2-adic Reference

This paper presents an experimental study comparing finite-integer Collatz escape-word statistics to those of an iid 2-adic word model. Its purpose is to experimentally characterize how the two differ and how much of that difference can be reproduced by simple finite-block descriptions — not to identify the mechanism responsible for the difference.

1.1 The question

Accelerated Collatz (Syracuse) trajectories can be summarized by their escape word: the sequence of 2-adic valuations \(k_i = v_2(3n_i + 1)\) recorded until the trajectory escapes a layer. A common modeling idealization replaces these valuations with an iid 2-adic draw. The finite integers are not iid, and the practical question is structural:

Conditioned on coarse path descriptors, where does the finite-integer word measure depart from the iid reference, and is that departure captured — or generated — by finite-block statistics?

We answer the first question ("where") and obtain a consistently negative answer to the stronger second question ("generated"): finite blocks diagnose the departure, with discriminative power that increases with block length, but the finite-block generative models we tried do not reproduce the whole-word deficit.

1.2 What we claim, and what we do not

We claim Short blocks of valuation categories carry a measurable diagnostic signal whose strength increases with block length, separating finite-integer words from iid words inside fixed conditioning cells, after controlling for cumulative valuation, parity, and path shape.
We do not claim That this signal constitutes a generative model of the finite-integer word measure; that the measured signal identifies any single mechanism (Doob h-transform, hidden semi-Markov process, Gibbs measure, first-passage conditioning, …); or anything about the truth of the Collatz conjecture. These model classes appear only as candidates in §6.

1.3 Contributions

  1. A shared conditioning scheme — states defined by bridge cluster, cumulative-valuation window, and parity — enabling direct comparison of four finite-block tests (§3).
  2. A diagnostic result: block log-ratio scores separate finite-integer words from iid words, and the separation grows monotonically as the block length increases from \(L=3\) to \(L=6\) (§4, Tests 1–2).
  3. Two negative generative results: finite-block reweighting overcorrects, and an approximate maximum-entropy block projection does no better than raw/damped reweighting (§4, Tests 3–4; consolidated in §5).
  4. An honest account of the coarse A/B/C/D self-classifications generated by the scripts, and of the residual structure — indexed by bridge shape and parity — that no test removed (§5).

1.4 Reading guide

§2 fixes notation and the reference model. §3 defines the states and the block score. §4 reports the four tests in turn. §5 is the central chapter: it consolidates the negative results and explains the classification letters. §6 lists candidate model classes as candidates only and flags the ways each could be over-read. §7 collects definitions, constants, and reproduction notes.