Versions benchmark rework 2 by alexey-milovidov · Pull Request #972 · ClickHouse/ClickBench

alexey-milovidov · 2026-07-03T01:49:18Z

No description provided.

The 2012-04 and 2012-05 servers boot and speak the native protocol, but their client has no --query option, the interactive-over-pipe path hits a 'wrong query id' protocol bug, and the HTTP handler returns a Null-pointer error -- so no query can be scripted and the version can't be benchmarked (start_server's --query readiness probe never succeeds, failing the launch). 2012-06 is the first month with a --query-capable client, so the runnable prehistoric range is 2012-06 .. 2016-02. They remain reconstructed and recorded in monthly-built.tsv. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The date-labeled monthly reconstructions can't meaningfully run the big/complex datasets (the large multi-way joins job/tpcds/tpch, the 600M-row SSB lineorder_flat, the 500M-row coffeeshop fact) -- loading them only wastes time or crashes. Drop those from LOAD_DATASETS for prehistoric (YYYY-MM-DD) versions; their queries record null as usual. Published/tag versions are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…kip their download Extend the ssb/tpch/tpcds/coffeeshop/job skip beyond the date-labeled reconstructions to every version from before the first Docker release (revision < 53991). run-benchmark.sh now drops those datasets from the list it passes to cloud-init, so the VM neither DOWNLOADS nor loads them; run-version.sh applies the same skip in load_data for the local path and as a safety net. Published releases (1.1.53991+, calendar versions) are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Latest per-version runs from the sink: 158 versions re-benchmarked and 11 new prehistoric monthly reconstructions now present, named by their interpolated actual_version (0.0.10563 = 2012-06 .. 0.0.30388 = 2013-04, plus 0.0.51888/0.0.52186). 170 versions total. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Minimum version 1.1.53991 for ssb, uk, ontime, tpch, tpcds, job, coffeeshop (the first Docker release), and for hits Q18/Q19/Q33; 0.0.18847 for hits Q28/Q36/Q37/Q39/Q40/Q42. Remove the now-below-minimum results from the committed data: below-minimum queries are nulled, and for the whole-dataset minimums the dataset's load time and data size are dropped too (11 prehistoric 0.0.x versions affected). Regenerated data.generated.js. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… one The previous commit overwrote per-query minimums, lowering the higher calver minimums already set for tpch/tpcds/job/coffeeshop (20.5, 21.4, 25.1, 25.8, 26.x, ...). Recompute each as max(original, requested): the requested 1.1.53991 / 0.0.18847 only fills in queries that had no minimum (0); any existing higher minimum is kept. Result data is unaffected (the below-1.1.53991 nulling is a subset, and the higher per-query minimums were already honored when those versions ran). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…to current minvers Adds the newly-run 2014-2015 monthly reconstructions (0.0.43454 .. 0.0.53066) and the bare tags 1.1.53973..53988 -- 200 versions total. fetch-results.sh pulls RAW sink data (which reflects the minvers in effect when each version ran), so it now finishes by running the new apply-minvers.py: below-minimum query results are nulled and datasets entirely above a version get their load_time/data_size dropped, making every result consistent with the current queries/*.minver regardless of when it was benchmarked (idempotent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Taxi had no minimum; set all four taxi queries to 0.0.29410 (2013-03). Re-normalised the results (apply-minvers.py): taxi query results below 0.0.29410 are nulled and taxi load_time/data_size dropped for those versions. Versions at/above the minimum are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…slowdown) The old rule filled an absent query with 2x the worst time for that query across the shown versions. When a very old version is missing most results, that flattens them to ~2x the (fast) baseline and hides its true slowness (e.g. 0.0.10563 vs 26.6 showed 2.33x while its present queries ran 50-200x slower). Now also estimate 2x the version's own geometric-mean slowdown (over the queries it ran) applied to each query's baseline, and take the maximum of the two. The illustrated comparison now shows ~56x; modern comparisons are unaffected (their slowdown estimate is ~1, so the worst-based term dominates as before). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 10 commits July 3, 2026 01:44

versions: horizon-chart scale gaps 2 -> 4 (1x/4x/16x/64x)

f936d94

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov self-assigned this Jul 3, 2026

alexey-milovidov merged commit 01d9e95 into main Jul 3, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Versions benchmark rework 2#972

Versions benchmark rework 2#972
alexey-milovidov merged 10 commits into
mainfrom
versions-benchmark-rework-2

alexey-milovidov commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alexey-milovidov commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant