Appendix: definitions, constants, reproduction
A. Constants
| Name | Value | Role |
|---|---|---|
POWERS | 24, 25, 26, 27, 28 | residue exponents enumerated |
HS | 2, 3, 4, 5, 6 | layer offsets |
IID_SAMPLES_PER_H | 160000 | iid words per layer |
ACTUAL_SAMPLE_PER_PH | 20000 | actual words per (power, h) |
SEED | 20260625 | RNG seed |
SMOOTH_ALPHA | 0.5 | add-α smoothing (anomaly test) |
SMOOTH_MIX / λ | 0.02 | uniform-mixture smoothing (renorm/projection) |
MIN_STATE_IID_MASS | 1e-7 | stability threshold for keeping a state |
| bridge \(z_{25}\) cuts | q_low −1.5, q_high −0.25 | tertile cuts for bridge cluster |
| regularization sweep | 0.0, 0.5, 0.75, 0.9 | Test 4 (maxent) |
| α sweep | 0.0, 0.25, 0.5, 0.75, 1.0 | Test 3 (reweighting) |
B. State coordinates
x_K window. \(x_K = K_\tau - (\text{power} - h)\), bucketed as
exhaustion_0_31 \([0,32)\), deep_32_63 \([32,64)\),
tail_64_95 \([64,96)\); words outside \([0,96)\) are dropped.
bridge cluster. From the reparameterized log-magnitude path,
\(z_{25} = x(0.25) - 0.25\,x(1)\). With cuts \(q_{\text{low}}=-1.5\),
\(q_{\text{high}}=-0.25\): \(z_{25} \le q_{\text{low}} \Rightarrow\)
late_growth; \(z_{25} \ge q_{\text{high}} \Rightarrow\)
early_growth; otherwise balanced. The companion features
\(z_{50}, z_{75}\) and the final drift enter the regression baseline.
parity. even if power is even, else
odd.
C. Block score
Train-split smoothed log-ratio per \((L,\text{state},u\text{-bin},\text{block})\), summed over a test word's sliding windows. Position bins \(u\) are deciles of \((i+1)/\tau\). A state is scored at length \(L\) only if it is stable at that length.
score = 0
for i in range(tau - L + 1):
block = ",".join(k_cat(k) for k in word[i:i+L])
score += lookup[(L, state, u_bin(i, tau), block)] # log2(actual_p / iid_p)
D. File manifest
| File | Role |
|---|---|
collatz_block_anomaly_score.py | Test 1 + shared definitions |
collatz_block_length_renormalization.py | Test 2 |
collatz_block_maxent_projection.py | Test 4 |
collatz_block_anomaly_report.md | Test 1 report (class B) |
collatz_block_length_renormalization_report.md | Test 2 report (class B) |
collatz_block_reweighting_report.md | Test 3 report (class C) |
collatz_block_maxent_projection_report.md | Test 4 report (class C) |
state_level_delta_report.md | Auxiliary delta report: state-level projection |
prefix_cylinder_delta_report.md | Auxiliary delta report: prefix-cylinder projection |
boundary_delta_report.md | Auxiliary delta report: boundary projection |
remaining_K_chain_report.md | Auxiliary delta report: remaining_K chain |
maxent_vs_raw_rmse.svg | Figure 1 |
residuals_vs_regularization.svg | Figure 2 |
focus_state_maxent_fit.svg | Figure 3 |
E. Reproduction
Python 3.10+ with numpy. The scripts require an upstream
collatz_escape_word_deficit.py and binary status caches
odd_only_status_p{24..28}.bin; edit SRC and
CACHE_DIRS at the top of collatz_block_anomaly_score.py
to local paths, then run the three scripts in order. Randomness is seeded and the
train/test split is by deterministic sample-index parity; reruns reproduce up to
floating-point summation order.
A/B/C/D verdicts are coarse self-diagnostics with author-chosen
thresholds (§5.4), reported verbatim. They are not external benchmarks, and the
sampled AUC/RMSE numbers carry unquantified sampling error.