Skip to content

Versions benchmark rework 2#972

Merged
alexey-milovidov merged 10 commits into
mainfrom
versions-benchmark-rework-2
Jul 3, 2026
Merged

Versions benchmark rework 2#972
alexey-milovidov merged 10 commits into
mainfrom
versions-benchmark-rework-2

Conversation

@alexey-milovidov

Copy link
Copy Markdown
Member

No description provided.

alexey-milovidov and others added 10 commits July 3, 2026 01:44
The 2012-04 and 2012-05 servers boot and speak the native protocol, but their client has
no --query option, the interactive-over-pipe path hits a 'wrong query id' protocol bug,
and the HTTP handler returns a Null-pointer error -- so no query can be scripted and the
version can't be benchmarked (start_server's --query readiness probe never succeeds,
failing the launch). 2012-06 is the first month with a --query-capable client, so the
runnable prehistoric range is 2012-06 .. 2016-02. They remain reconstructed and recorded
in monthly-built.tsv.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The date-labeled monthly reconstructions can't meaningfully run the big/complex datasets
(the large multi-way joins job/tpcds/tpch, the 600M-row SSB lineorder_flat, the 500M-row
coffeeshop fact) -- loading them only wastes time or crashes. Drop those from LOAD_DATASETS
for prehistoric (YYYY-MM-DD) versions; their queries record null as usual. Published/tag
versions are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kip their download

Extend the ssb/tpch/tpcds/coffeeshop/job skip beyond the date-labeled reconstructions to
every version from before the first Docker release (revision < 53991). run-benchmark.sh
now drops those datasets from the list it passes to cloud-init, so the VM neither
DOWNLOADS nor loads them; run-version.sh applies the same skip in load_data for the local
path and as a safety net. Published releases (1.1.53991+, calendar versions) are
unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Latest per-version runs from the sink: 158 versions re-benchmarked and 11 new prehistoric
monthly reconstructions now present, named by their interpolated actual_version
(0.0.10563 = 2012-06 .. 0.0.30388 = 2013-04, plus 0.0.51888/0.0.52186). 170 versions total.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Minimum version 1.1.53991 for ssb, uk, ontime, tpch, tpcds, job, coffeeshop (the first
Docker release), and for hits Q18/Q19/Q33; 0.0.18847 for hits Q28/Q36/Q37/Q39/Q40/Q42.
Remove the now-below-minimum results from the committed data: below-minimum queries are
nulled, and for the whole-dataset minimums the dataset's load time and data size are
dropped too (11 prehistoric 0.0.x versions affected). Regenerated data.generated.js.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… one

The previous commit overwrote per-query minimums, lowering the higher calver minimums
already set for tpch/tpcds/job/coffeeshop (20.5, 21.4, 25.1, 25.8, 26.x, ...). Recompute
each as max(original, requested): the requested 1.1.53991 / 0.0.18847 only fills in queries
that had no minimum (0); any existing higher minimum is kept. Result data is unaffected
(the below-1.1.53991 nulling is a subset, and the higher per-query minimums were already
honored when those versions ran).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…to current minvers

Adds the newly-run 2014-2015 monthly reconstructions (0.0.43454 .. 0.0.53066) and the bare
tags 1.1.53973..53988 -- 200 versions total. fetch-results.sh pulls RAW sink data (which
reflects the minvers in effect when each version ran), so it now finishes by running the
new apply-minvers.py: below-minimum query results are nulled and datasets entirely above a
version get their load_time/data_size dropped, making every result consistent with the
current queries/*.minver regardless of when it was benchmarked (idempotent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Taxi had no minimum; set all four taxi queries to 0.0.29410 (2013-03). Re-normalised the
results (apply-minvers.py): taxi query results below 0.0.29410 are nulled and taxi
load_time/data_size dropped for those versions. Versions at/above the minimum are
unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…slowdown)

The old rule filled an absent query with 2x the worst time for that query across the shown
versions. When a very old version is missing most results, that flattens them to ~2x the
(fast) baseline and hides its true slowness (e.g. 0.0.10563 vs 26.6 showed 2.33x while its
present queries ran 50-200x slower). Now also estimate 2x the version's own geometric-mean
slowdown (over the queries it ran) applied to each query's baseline, and take the maximum
of the two. The illustrated comparison now shows ~56x; modern comparisons are unaffected
(their slowdown estimate is ~1, so the worst-based term dominates as before).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov self-assigned this Jul 3, 2026
@alexey-milovidov alexey-milovidov merged commit 01d9e95 into main Jul 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant