07  AppendixCollatz finite-block diagnostics

Appendix: definitions, constants, reproduction

A. Constants

Table A.1 — Default constants (from collatz_block_anomaly_score.py).
NameValueRole
POWERS24, 25, 26, 27, 28residue exponents enumerated
HS2, 3, 4, 5, 6layer offsets
IID_SAMPLES_PER_H160000iid words per layer
ACTUAL_SAMPLE_PER_PH20000actual words per (power, h)
SEED20260625RNG seed
SMOOTH_ALPHA0.5add-α smoothing (anomaly test)
SMOOTH_MIX / λ0.02uniform-mixture smoothing (renorm/projection)
MIN_STATE_IID_MASS1e-7stability threshold for keeping a state
bridge \(z_{25}\) cutsq_low −1.5, q_high −0.25tertile cuts for bridge cluster
regularization sweep0.0, 0.5, 0.75, 0.9Test 4 (maxent)
α sweep0.0, 0.25, 0.5, 0.75, 1.0Test 3 (reweighting)

B. State coordinates

x_K window. \(x_K = K_\tau - (\text{power} - h)\), bucketed as exhaustion_0_31 \([0,32)\), deep_32_63 \([32,64)\), tail_64_95 \([64,96)\); words outside \([0,96)\) are dropped.

bridge cluster. From the reparameterized log-magnitude path, \(z_{25} = x(0.25) - 0.25\,x(1)\). With cuts \(q_{\text{low}}=-1.5\), \(q_{\text{high}}=-0.25\): \(z_{25} \le q_{\text{low}} \Rightarrow\) late_growth; \(z_{25} \ge q_{\text{high}} \Rightarrow\) early_growth; otherwise balanced. The companion features \(z_{50}, z_{75}\) and the final drift enter the regression baseline.

parity. even if power is even, else odd.

C. Block score

Train-split smoothed log-ratio per \((L,\text{state},u\text{-bin},\text{block})\), summed over a test word's sliding windows. Position bins \(u\) are deciles of \((i+1)/\tau\). A state is scored at length \(L\) only if it is stable at that length.

score = 0
for i in range(tau - L + 1):
    block = ",".join(k_cat(k) for k in word[i:i+L])
    score += lookup[(L, state, u_bin(i, tau), block)]   # log2(actual_p / iid_p)

D. File manifest

Table D.1 — Artifacts and their role.
FileRole
collatz_block_anomaly_score.pyTest 1 + shared definitions
collatz_block_length_renormalization.pyTest 2
collatz_block_maxent_projection.pyTest 4
collatz_block_anomaly_report.mdTest 1 report (class B)
collatz_block_length_renormalization_report.mdTest 2 report (class B)
collatz_block_reweighting_report.mdTest 3 report (class C)
collatz_block_maxent_projection_report.mdTest 4 report (class C)
state_level_delta_report.mdAuxiliary delta report: state-level projection
prefix_cylinder_delta_report.mdAuxiliary delta report: prefix-cylinder projection
boundary_delta_report.mdAuxiliary delta report: boundary projection
remaining_K_chain_report.mdAuxiliary delta report: remaining_K chain
maxent_vs_raw_rmse.svgFigure 1
residuals_vs_regularization.svgFigure 2
focus_state_maxent_fit.svgFigure 3

E. Reproduction

Python 3.10+ with numpy. The scripts require an upstream collatz_escape_word_deficit.py and binary status caches odd_only_status_p{24..28}.bin; edit SRC and CACHE_DIRS at the top of collatz_block_anomaly_score.py to local paths, then run the three scripts in order. Randomness is seeded and the train/test split is by deterministic sample-index parity; reruns reproduce up to floating-point summation order.

Reading the classifications The A/B/C/D verdicts are coarse self-diagnostics with author-chosen thresholds (§5.4), reported verbatim. They are not external benchmarks, and the sampled AUC/RMSE numbers carry unquantified sampling error.