Versions benchmark rework 2#972
Merged
Merged
Conversation
The 2012-04 and 2012-05 servers boot and speak the native protocol, but their client has no --query option, the interactive-over-pipe path hits a 'wrong query id' protocol bug, and the HTTP handler returns a Null-pointer error -- so no query can be scripted and the version can't be benchmarked (start_server's --query readiness probe never succeeds, failing the launch). 2012-06 is the first month with a --query-capable client, so the runnable prehistoric range is 2012-06 .. 2016-02. They remain reconstructed and recorded in monthly-built.tsv. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The date-labeled monthly reconstructions can't meaningfully run the big/complex datasets (the large multi-way joins job/tpcds/tpch, the 600M-row SSB lineorder_flat, the 500M-row coffeeshop fact) -- loading them only wastes time or crashes. Drop those from LOAD_DATASETS for prehistoric (YYYY-MM-DD) versions; their queries record null as usual. Published/tag versions are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kip their download Extend the ssb/tpch/tpcds/coffeeshop/job skip beyond the date-labeled reconstructions to every version from before the first Docker release (revision < 53991). run-benchmark.sh now drops those datasets from the list it passes to cloud-init, so the VM neither DOWNLOADS nor loads them; run-version.sh applies the same skip in load_data for the local path and as a safety net. Published releases (1.1.53991+, calendar versions) are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Latest per-version runs from the sink: 158 versions re-benchmarked and 11 new prehistoric monthly reconstructions now present, named by their interpolated actual_version (0.0.10563 = 2012-06 .. 0.0.30388 = 2013-04, plus 0.0.51888/0.0.52186). 170 versions total. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Minimum version 1.1.53991 for ssb, uk, ontime, tpch, tpcds, job, coffeeshop (the first Docker release), and for hits Q18/Q19/Q33; 0.0.18847 for hits Q28/Q36/Q37/Q39/Q40/Q42. Remove the now-below-minimum results from the committed data: below-minimum queries are nulled, and for the whole-dataset minimums the dataset's load time and data size are dropped too (11 prehistoric 0.0.x versions affected). Regenerated data.generated.js. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… one The previous commit overwrote per-query minimums, lowering the higher calver minimums already set for tpch/tpcds/job/coffeeshop (20.5, 21.4, 25.1, 25.8, 26.x, ...). Recompute each as max(original, requested): the requested 1.1.53991 / 0.0.18847 only fills in queries that had no minimum (0); any existing higher minimum is kept. Result data is unaffected (the below-1.1.53991 nulling is a subset, and the higher per-query minimums were already honored when those versions ran). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…to current minvers Adds the newly-run 2014-2015 monthly reconstructions (0.0.43454 .. 0.0.53066) and the bare tags 1.1.53973..53988 -- 200 versions total. fetch-results.sh pulls RAW sink data (which reflects the minvers in effect when each version ran), so it now finishes by running the new apply-minvers.py: below-minimum query results are nulled and datasets entirely above a version get their load_time/data_size dropped, making every result consistent with the current queries/*.minver regardless of when it was benchmarked (idempotent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Taxi had no minimum; set all four taxi queries to 0.0.29410 (2013-03). Re-normalised the results (apply-minvers.py): taxi query results below 0.0.29410 are nulled and taxi load_time/data_size dropped for those versions. Versions at/above the minimum are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…slowdown) The old rule filled an absent query with 2x the worst time for that query across the shown versions. When a very old version is missing most results, that flattens them to ~2x the (fast) baseline and hides its true slowness (e.g. 0.0.10563 vs 26.6 showed 2.33x while its present queries ran 50-200x slower). Now also estimate 2x the version's own geometric-mean slowdown (over the queries it ran) applied to each query's baseline, and take the maximum of the two. The illustrated comparison now shows ~56x; modern comparisons are unaffected (their slowdown estimate is ~1, so the worst-based term dominates as before). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.