Results, by test
Each test is reported in its own section, with the numbers taken verbatim from the corresponding report. Diagnostic results come first (Tests 1–2), then the generative attempts (Tests 3–4). The consolidated reading is deferred to §5.
4.1 Test 1 — block anomaly score (B3/B4) class B
In the focus state late_growth | tail_64_95 | even, the \(B_4\)
score's actual median (\(-0.047\)) lies above the iid median (\(-0.060\)), and the
weighted AUC separating high-scoring actual words from iid is \(0.719\). Within that
state the lowest \(B_4\)-score decile has survival ratio \(S = 0.035\) and the
highest decile \(S = 1.806\) — a clear monotonic trend in this sparsely sampled state.
Against the full logistic baseline, the block score adds a small but real amount of separation, and crucially does not explain away the structural covariates:
| model | weighted AUC | Δ vs base | B4 coef | bridge |coef| | parity coef |
|---|---|---|---|---|---|
| x_K + parity + bridge + z | 0.5023 | 0.0000 | 0.0000 | 1.1312 | 0.3682 |
| + B4 | 0.5260 | 0.0236 | 0.2330 | 1.1151 | 0.4268 |
| + B3 + B4 | 0.5246 | 0.0223 | 0.1695 | 1.1132 | 0.4423 |
Adding \(B_4\) moves AUC by \(0.0236\). The bridge coefficient falls by only \(0.0162\); the parity coefficient does not decrease; instead, it increases by \(0.0586\). Test 1 is therefore classified as B — "B4 score helps but bridge/parity remain strong", with the explicit caveat that this is a split-sample, sampled diagnostic, not an exact aggregation proof.
Takeaway. The block score improves discrimination, but the bridge/parity effects remain.
4.2 Test 2 — block-length renormalization (L = 3…6) class B
The discriminative signal increases monotonically with block length, by both the marginal-score AUC and the \(+\)score logistic AUC, while the logistic base AUC remains at chance level (\(0.5025\)):
| L | marginal score AUC | logistic base AUC | logistic + score AUC | Δ |
|---|---|---|---|---|
| 3 | 0.5611 | 0.5025 | 0.5363 | 0.0337 |
| 4 | 0.5768 | 0.5025 | 0.5436 | 0.0411 |
| 5 | 0.5966 | 0.5025 | 0.5536 | 0.0511 |
| 6 | 0.6198 | 0.5025 | 0.5643 | 0.0618 |
The best \(+\)score logistic AUC is \(0.5643\) at \(L=6\); the gain from \(L=3\) to \(L=6\) is \(0.0280\). But the residual gap does not close: at \(L=6\) the bridge absolute residual is \(1.1011\) and the parity residual is \(0.3749\). In the focus state at \(L=6\), the lowest score decile has \(S = 0.117\), the highest decile with finite survival ratio \(S = 38.091\), and the top decile has zero iid mass — the strongest effect is also observed in the sparsest region.
Overall, Test 2 is therefore classified as B — "AUC grows but residual remains": longer blocks discriminate better but do not remove the bridge/parity structure.
Takeaway. Longer blocks consistently improve discrimination, but leave substantial bridge/parity structure unexplained.
4.3 Test 3 — finite-block reweighting class C
Reweighting iid words by \(2^{\alpha S_L}\) is the first generative attempt. The best full-state fit is obtained with mildly damped short blocks — \(L=3\), \(\alpha=0.25\), RMSE \(0.000440978\), JS \(0.000921815\). Longer or stronger reweighting is worse, not better: at \(L=6\) the best choice is \(\alpha = 0.0\) (no reweighting at all), RMSE \(0.000642649\).
The focus state shows the overcorrection directly. Its best reweighting (\(L=6\), \(\alpha=0.5\)) still under-predicts the mass — actual \(0.00092052\) vs predicted \(0.00029821\) — and increasing \(\alpha\) to \(1\) drives the modelled focus-state survival down to \(0.00416667\), against an actual survival of \(0.472461\). At the global best fit the bridge RMSE is \(0.000427051\) and the parity RMSE \(0.00262763\).
Taken together, these results correspond to C — "reweighting overcorrects or is not generative": the small improvement is entirely from damped short-block reweighting; longer raw reweighting is over-sharp — useful diagnostically, weak as a generative model.
Takeaway. Short, mildly damped reweighting can improve the aggregate fit, but stronger block reweighting overcorrects rather than generating the actual distribution.
4.4 Test 4 — maximum-entropy block projection class C
The second generative attempt replaces heuristic reweighting with an approximate regularized IPF that matches block marginals. It does not outperform the simpler baseline: the best maximum-entropy fit (\(L=3\), regularization \(0.75\)) has RMSE \(0.000493147\) and JS \(0.000868073\), worse than the best raw/damped reweighting RMSE of \(0.000440978\).
Regularization does reduce the parity residual for the longer blocks — at regularization \(0\) the \(L=5\) and \(L=6\) parity residuals begin at high values and decrease toward the \(L=3/L=4\) level as regularization increases — but this reflects regularization pulling the projection toward the damped baseline, not a new structure being captured.
On the focus state the projection's best survival is \(0.444001\) (\(L=5\), regularization \(0.9\)) against an actual survival of \(0.472461\) — close in this one state, but obtained at heavy regularization and not accompanied by a better global fit.
Accordingly, Test 4 also falls into C — "maxent no better than raw/damped", with the explicit reminder that this is an approximate projection, not a full, exact IPF solution over all words.
Takeaway. The projection can fit the focus state under heavy regularization, but it does not improve the global state-mass fit.
4.5 Summary of the four verdicts
| Test | Type | Headline | Self-class |
|---|---|---|---|
| 1 · anomaly B3/B4 | diagnostic | +0.0236 AUC; focus AUC 0.719 | B |
| 2 · length renorm | diagnostic | +score AUC 0.5363→0.5643 | B |
| 3 · reweighting | generative | best RMSE 0.000440978 (L3, α0.25) | C |
| 4 · maxent projection | generative | best RMSE 0.000493147 (worse) | C |
Diagnostic tests are classified as B, whereas the generative tests are classified as C. The next section reads these four verdicts together and explains the meaning of the letter grades.
4.6 Auxiliary analysis: the actual−iid Δ-map
As a descriptive statistic, this section reports the actual–iid mass difference
\[ \Delta(\cdot) \;=\; \mu_{\text{actual}}(\cdot) \;-\; \mu_{\text{iid}}(\cdot) \]projected onto several coordinates. This is not a new generative model; it is a
diagnostic quantity that supplements where the discrepancy detected by
Tests 1–4 is localized. We project Δ onto state, prefix cylinder, transition, and
boundary / remaining_K coordinates.
In state coordinates, the combination
bridge_cluster + x_K_window + parity provides a sharp localization of Δ. In prefix
cylinders the difference is visible from the early prefix on, but lengthening the
window does not concentrate it in a single prefix. Viewed in terms of transition structure and prefix growth, Δ likewise does not collapse onto any single edge or branch.
In boundary coordinates, remaining_K provides the sharpest localization of Δ, and
the largest \(|\Delta|\) appears at remaining_K = 32–63. The thinness
extends into remaining_K = 64–95 and 96–127, but the
largest absolute mass difference remains at 32–63 (96–127 has the lowest ratio,
while its mass and L1 share are small).
| remaining_K | actual | iid | \(\Delta\) | ratio | L1 share |
|---|---|---|---|---|---|
| 32–63 | 1.959532 | 2.139743 | −0.180211 | 0.916 | 38.57% |
| 64–95 | 0.266435 | 0.341662 | −0.075227 | 0.780 | 16.10% |
| 96–127 | 0.018644 | 0.031704 | −0.013059 | 0.588 | 2.80% |
In the leading bands (32–63, 64–95, 96–127) the mass delta is negative, while the
conditional downstream transition delta can be positive. For instance
64-95 -> 32-63 has mass delta \(-0.006059\) but conditional delta
\(+0.007861\), and 32-63 -> 16-31 has mass delta \(-0.009262\) but
conditional delta \(+0.004618\). This separates two statements: actual carries little mass in the band, yet conditioned on being in the band its share moving downstream is
not necessarily weak. It is therefore consistent with a difference in how mass is placed along the
remaining_K chain between actual and iid, rather than with a simple
local-transition malfunction.
| coordinate | sharpness | main finding |
|---|---|---|
| block score | — | diagnostic signal present, generation not reproduced (§4.1–4.4) |
| state | medium–high | localized by bridge_cluster + x_K_window + parity |
| prefix | low | visible early but not concentrated in a single prefix |
| transition | low | not collapsed onto a single edge / branch |
| boundary remaining_K | high | largest \(|\Delta|\) at 32–63 (the band with the largest observed discrepancy) |
remaining_K is causal.
remaining_K = 32–63 should be read as the band where the difference
between finite-integer escape words and the iid reference is largest in observation
(the band with the largest observed discrepancy), not as a generating source of the discrepancy.
Takeaway. The residual is not concentrated in a single prefix or
transition cylinder; it localizes more sharply in state coordinates and in the
remaining_K boundary distance, with the band showing the largest observed discrepancy at
remaining_K = 32–63.