05 Negative ResultsCollatz finite-block diagnostics

Negative results: a hierarchy of failures

This is the central chapter. The contribution of the study is not a model but a structured map of where an iid 2-adic approximation breaks: a list of descriptions that do not account for the finite-vs-iid discrepancy, ordered from the simplest to progressively more expressive descriptions, together with the residual that survives all of them.

Throughout this chapter, the discrepancy means the difference between the observed finite-integer escape-word measure and the iid 2-adic reference measure.

5.1 The ladder of eliminations

Read top to bottom, each rung is a more capable description than the one above, and each is ruled out as providing a complete explanation of the discrepancy. The first four rungs come from previous analyses and from the regression baseline used here; the last three are the present tests.

Table 5.1 — What does not account for the discrepancy, and the evidence in hand.
Rung	Rejected explanation	Evidence here
1	cumulative valuation \(x_K\) alone	in the baseline, \(x_K\)+parity+bridge+\(z\) give AUC ≈ 0.50
2	escape-word length \(\tau\) alone	prior step; \(\tau\) is informative but not sufficient
3	mean valuation / cumulative drift alone	prior step; correlated but not the main effect
4	a one-step (near-iid local) picture	prior step; local transitions close to iid
5	finite blocks as a generative model (reweighting)	Test 3: overcorrects; best fit is the damped baseline C
6	finite-block maximum entropy	Test 4: no better than raw/damped C
7	accumulation of block anomalies → whole-word deficit	Tests 1–2: AUC grows, bridge/parity residual structure persists B

The shape of the finding The discrepancy is diagnosable at every level — short blocks detect it, and longer blocks detect it more clearly — yet it is not generated by any finite-block construction we tried, and a residual structure indexed by bridge shape and parity is left untouched. Diagnostic and generative power come apart: the block statistics are good instruments for finding the discrepancy, but weak ingredients for reconstructing the finite-integer word measure.

5.2 What the diagnostic tests do show

These results should not be read as implying that finite blocks reveal nothing. The finite-block diagnostics reveal a real, monotonic signal: the \(+\)score AUC rises from \(0.5363\) at \(L=3\) to \(0.5643\) at \(L=6\) (Table 4.2), and the focus-state \(B_4\) separation reaches AUC \(0.719\) (§4.1). The negative conclusion is more specific:

the rising AUC does not drive the bridge coefficient to zero (it falls by \(0.0162\) when \(B_4\) is added; §4.1) — the block signal and the bridge structure capture largely independent aspects of the discrepancy;
the remaining parity effect is not reduced at all by the block score (§4.1, §4.2);
no block length tested closes the gap, and whether any finite \(L\) would is left open.

5.3 Why the generative tests fail differently

The two generative tests fail in two distinct ways, which is itself informative:

Reweighting (Test 3) overcorrects. The only setting that improves the fit uses heavily damped short blocks (\(L=3\), \(\alpha=0.25\)); as soon as the blocks are long or the exponent is full, the reweighted measure becomes too sharp — the focus survival collapses from an actual \(0.472461\) to \(0.00416667\) at \(L=6,\ \alpha=1\). A raw product of block ratios over-counts.
Maximum entropy (Test 4) does not over-count, and still does not win. Replacing the raw product with a regularized projection removes the oversharpening but yields an RMSE of \(0.000493147\), worse than the damped baseline's \(0.000440978\) (Figure 1). Matching block marginals is not enough to recover the state distribution.

Taken together, these results show that the failure is not merely "we reweighted too hard". A principled finite-block exponential family that matches block marginals also fails to recover the observed distribution. That is the stronger negative statement.

5.4 Reading the classification letters

Each script emits a coarse self-classification. These letters are not a shared scale and not an external benchmark. They are per-test verdicts with thresholds chosen by the author, reported verbatim. Their meanings:

Table 5.2 — The self-classification rubric, per test, in plain terms.
Letter	In the diagnostic tests (1–2)	In the generative tests (3–4)
A	finite blocks reconstruct the whole-word deficit (AUC gain large and residual structure small)	—
B	AUC grows with the block score, but the bridge/parity structure remains	—
C	improvement saturates near \(B_4\)	the generator overcorrects, or is no better than the simple/damped baseline
D	signal too sparse or noisy to read	—

Concretely, in the renormalization test the thresholds are: A requires AUC gain \(> 0.03\) and bridge residual \(< 0.3\) and parity residual \(< 0.1\); B requires AUC gain \(> 0.005\); C if the \(L=6\) AUC barely exceeds \(L=3\); else D. The observed AUC gain (\(0.0280\)) clears B but the residuals (bridge \(1.1011\), parity \(0.3749\)) are far above the A thresholds — hence B, not A.

How to read B and C together The diagnostic tests reach B ("the signal is real but the residual structure remains") and the generative tests reach C ("the generator does not beat the baseline"). No test reached A. The honest summary is that finite-block features are informative diagnostics and weak generators — the verdicts are consistent with that and do not support more.

5.5 The residual that survives everything

The common conclusion of all four tests is the persistence of a single residual structure. What remains unexplained is specific and reproducible: a state-distribution component indexed by bridge shape and by parity. The corresponding coefficients remain large (bridge \(\approx 1.10\)–\(1.13\), parity \(\approx 0.37\)–\(0.44\)); the pattern is not explained away by adding block scores, and it is also flagged by the bridge RMSE and parity RMSE figures in Tests 3–4. We record this as the principal unresolved structure; §6 lists candidate frameworks for thinking about it, strictly as candidates.

The auxiliary Δ analysis (§4.6) supplements where this residual is visible. Rather than remaining merely unexplained, the residual localizes in state coordinates and in the remaining_K boundary distance, while it does not collapse onto any single prefix or transition cell; the largest \(|\Delta|\) is observed at remaining_K = 32–63. This reinforces the negative conclusion of this chapter — that finite-block features diagnose the discrepancy but do not generate the whole-word measure difference — and identifies remaining_K = 32–63 as the band with the largest observed discrepancy, not as an identified source of the discrepancy.

5.6 What is closed here, and what is not

The closed claim of this paper is deliberately narrow: within the tested escape-word coordinates, the finite-vs-iid discrepancy is not reproduced by the finite-block approximations tried here. Short and medium blocks detect the discrepancy, but finite-block reweighting and finite-block maximum entropy do not generate the observed state distribution.

This is a deliberately narrow boundary statement rather than an explanation of the discrepancy or a proposal of a new Collatz mechanism: along this iid approximation route, local block corrections extend only this far. Locating the remaining structure in residue classes, inverse trees, prefix cylinders, stopping boundaries, or hidden states is left to later work. These directions are possible continuations rather than consequences of the present results.