07 AppendixCollatz finite-block diagnostics

Appendix: definitions, constants, reproduction

A. Constants

Table A.1 — Default constants (from `collatz_block_anomaly_score.py`).
Name	Value	Role
`POWERS`	24, 25, 26, 27, 28	residue exponents enumerated
`HS`	2, 3, 4, 5, 6	layer offsets
`IID_SAMPLES_PER_H`	160000	iid words per layer
`ACTUAL_SAMPLE_PER_PH`	20000	actual words per (power, h)
`SEED`	20260625	RNG seed
`SMOOTH_ALPHA`	0.5	add-α smoothing (anomaly test)
`SMOOTH_MIX` / λ	0.02	uniform-mixture smoothing (renorm/projection)
`MIN_STATE_IID_MASS`	1e-7	stability threshold for keeping a state
bridge \(z_{25}\) cuts	q_low −1.5, q_high −0.25	tertile cuts for bridge cluster
regularization sweep	0.0, 0.5, 0.75, 0.9	Test 4 (maxent)
α sweep	0.0, 0.25, 0.5, 0.75, 1.0	Test 3 (reweighting)

B. State coordinates

x_K window. \(x_K = K_\tau - (\text{power} - h)\), bucketed as exhaustion_0_31 \([0,32)\), deep_32_63 \([32,64)\), tail_64_95 \([64,96)\); words outside \([0,96)\) are dropped.

bridge cluster. From the reparameterized log-magnitude path, \(z_{25} = x(0.25) - 0.25\,x(1)\). With cuts \(q_{\text{low}}=-1.5\), \(q_{\text{high}}=-0.25\): \(z_{25} \le q_{\text{low}} \Rightarrow\) late_growth; \(z_{25} \ge q_{\text{high}} \Rightarrow\) early_growth; otherwise balanced. The companion features \(z_{50}, z_{75}\) and the final drift enter the regression baseline.

parity. even if power is even, else odd.

C. Block score

Train-split smoothed log-ratio per \((L,\text{state},u\text{-bin},\text{block})\), summed over a test word's sliding windows. Position bins \(u\) are deciles of \((i+1)/\tau\). A state is scored at length \(L\) only if it is stable at that length.

score = 0
for i in range(tau - L + 1):
    block = ",".join(k_cat(k) for k in word[i:i+L])
    score += lookup[(L, state, u_bin(i, tau), block)]   # log2(actual_p / iid_p)

D. File manifest

Table D.1 — Artifacts and their role.
File	Role
`collatz_block_anomaly_score.py`	Test 1 + shared definitions
`collatz_block_length_renormalization.py`	Test 2
`collatz_block_maxent_projection.py`	Test 4
`collatz_block_anomaly_report.md`	Test 1 report (class B)
`collatz_block_length_renormalization_report.md`	Test 2 report (class B)
`collatz_block_reweighting_report.md`	Test 3 report (class C)
`collatz_block_maxent_projection_report.md`	Test 4 report (class C)
`state_level_delta_report.md`	Auxiliary delta report: state-level projection
`prefix_cylinder_delta_report.md`	Auxiliary delta report: prefix-cylinder projection
`boundary_delta_report.md`	Auxiliary delta report: boundary projection
`remaining_K_chain_report.md`	Auxiliary delta report: remaining_K chain
`maxent_vs_raw_rmse.svg`	Figure 1
`residuals_vs_regularization.svg`	Figure 2
`focus_state_maxent_fit.svg`	Figure 3

E. Reproduction

Python 3.10+ with numpy. The scripts require an upstream collatz_escape_word_deficit.py and binary status caches odd_only_status_p{24..28}.bin; edit SRC and CACHE_DIRS at the top of collatz_block_anomaly_score.py to local paths, then run the three scripts in order. Randomness is seeded and the train/test split is by deterministic sample-index parity; reruns reproduce up to floating-point summation order.

Reading the classifications The A/B/C/D verdicts are coarse self-diagnostics with author-chosen thresholds (§5.4), reported verbatim. They are not external benchmarks, and the sampled AUC/RMSE numbers carry unquantified sampling error.