Versions benchmark rework by alexey-milovidov · Pull Request #968 · ClickHouse/ClickBench

alexey-milovidov · 2026-07-01T06:39:07Z

No description provided.

Rebuild the ClickHouse Versions Benchmark infrastructure from scratch around Docker images so every historical and current version can be run identically, replacing the old apt-based scripts. - list-versions.sh: select versions from the authoritative version_date.tsv (all 1.1.x + latest patch per YY.MM, 151 versions), resolve each to a yandex/clickhouse image, package, or unavailable (image-aware, handles 3- vs 4-component tag mismatches). - prepare-data/: build canonical Native data files for hits, SSB (SF100), mgbench (logs1/2/3) and NYC taxi using only oldest-compatible types (Nullable kept only where the queries need IS NULL); stored zstd-6 and streamed via `zstd -dc | clickhouse-client` at load time. - create/: per-version DDL (legacy MergeTree(date,(key),8192) for the earliest 1.1.x, modern PARTITION BY/ORDER BY otherwise) with column schemas under create/schema/. - run-version.sh / run-all.sh: provider abstraction (image + package-in- ubuntu fallback), IPv4 listen override and a matching-version sidecar client image to repair the oldest server images (back to 1.1.54019), plain INSERT ... FORMAT Native loading, 75-query set timed per dataset. Validated full-scale on 1.1.54019 (oldest) and 1.1.54378; data files are gitignored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add build-from-source/ to compile and run the ClickHouse versions that were never published as a Docker image or package — the bare-number early tags 53973..54011 and the 1.1.x releases 54165/54318/54335/54336/54358/54362/54370. - Dockerfile.ubuntu1604: build a tag in its contemporary environment (Ubuntu 16.04) into a runnable clickhouse-built:<v> image. Handles the era quirks: compiler escalates with date (gcc-5 -> 6 -> 7, the later two from the ubuntu-toolchain-r PPA via ARG GCC), strip the hardcoded -Werror, tolerant submodule init (contrib/zookeeper's upstream is gone -> cmake falls back to system libzookeeper-mt-dev), IPv4 listen, a clickhouse multi-call shim and the pre-created data dirs the 2016 server needs. - build.sh / build-all.sh: build one or many (JOBS concurrent — a single make -j$(nproc) doesn't saturate the cores on these small codebases). - versions.txt: the build list with tag, date and required GCC per version. - list-versions.sh: route these versions to their clickhouse-built:<v> image and order all 189 versions chronologically; nothing is "unavailable" anymore. - run-all.sh: load PARALLEL versions concurrently, then benchmark sequentially. - run-version.sh: LOAD_DATASETS lets a run skip a dataset's load (e.g. the huge taxi table) while its queries still run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

create.sh parsed bare build-number versions (e.g. "53982", the pre-1.1 early-release tags) as major=53982 >= 18 and emitted modern PARTITION BY / ORDER BY syntax, which the 2016 servers reject — so every table create failed and those versions produced all-null results. Treat a bare numeric version as an early build (custom partitioning landed at build 54310; all bare tags predate it), so they correctly get the legacy MergeTree(date,(key),8192) engine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Mirror the main ClickBench cloud flow for the Versions Benchmark: benchmark one version per fresh VM and send the result to the sink. - cloud-init.sh.in: install Docker, download the prepared Native files from s3://clickhouse-public-datasets/versions-benchmark/, build the image from source when the version has none (clickhouse-built:* via build-from-source), run run-version.sh, POST the result JSON (enriched with machine + kind) and the log to sink.data on play.clickhouse.com, then terminate. - run-benchmark.sh: resolve a version's image/tag/gcc and launch a VM (terminate-on-shutdown, capacity-retry), as the main launcher does. - run-all-benchmarks.sh: one VM per runnable version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

If clickhouse dies during a query (e.g. OOM-killed), the container exits but its data layer survives. Detect the dead server (SELECT 1 fails), revive it with docker start (relaunch the daemon for the package provider), and retry the query up to CRASH_RETRIES (default 2). This keeps one heavy query from nulling out every subsequent query for that version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

e.g. '26.6' resolves to '26.6.1.1193' (one patch per YY.MM is kept), and the launcher canonicalises to the full version. Exact versions and bare tags still match directly; an ambiguous prefix lists the candidates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A prefix now picks the newest matching version instead of erroring on ambiguity: 24 -> 24.12.x, 1.1 -> the latest 1.1.x, 26.6 -> 26.6.1.1193. Exact versions and bare tags still match directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Builds an EBS volume holding the prepared Native files (sized just-enough), snapshots it (labelled versions-data, tagged Name=clickbench-versions-data) and deletes the working volume. Standalone and not wired into the launcher: a snapshot-backed volume lazy-loads from S3, so for one-shot VMs it is not faster than the plain S3 download unless Fast Snapshot Restore or volume reuse is used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…loop stdin - load_data now echoes each CREATE (with DDL), the INSERT ... FORMAT Native and source file, and "loaded <table>: N rows in Ns" — so the cloud-init log shows what's happening during ingest. - Client invocations set HOME=/tmp (old images' clickhouse user has HOME /nonexistent -> history-file error) and TZ=UTC, and the sidecar client mounts the host /usr/share/zoneinfo (some old client images ship no tzdata and fail at startup with "Could not determine local time zone"). - Fix a stdin-drain regression from the crash-retry: the per-query `docker exec/run -i` client (and the SELECT 1 liveness probe) consumed the query file the benchmark loop reads on stdin, truncating each version to ~60/75 queries. Read queries on FD 3 and give the probe </dev/null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Echo 'qN [dataset]: [t1, t2, t3]' to the log as each query finishes, and cat results/<version>.json at the end so the full result is visible in the run output / cloud-init log (and thus received via the sink), not just written to a file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pipe the compressed file through pv before zstd -> INSERT, so loads report a periodic progress bar (percentage, rate, ETA) based on the known file size. Falls back to cat when pv is absent; pv added to the cloud-init apt install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fetch the actual git commit date for every from-source version (the bare tags 53973..54011 had bogus 2016-01-01 fallbacks from an earlier rate-limited fetch) and record it in versions.txt; list-versions.sh now reports that commit date for built versions instead of the version_date.tsv release date. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Print 'table <TAB> bytes' from system.parts (database 'default'), falling back for old versions without it to du -sLb on the data dir (/var/lib/clickhouse or /opt/clickhouse), following symlinks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Cold-cache query reads and ingest are disk-bound, so use gp3 (default 1000 MB/s / 16000 IOPS, both overridable via throughput=/iops=) instead of gp2 whose throughput is tied to size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Build the list of dataset files and fetch them concurrently (xargs -P 8 wget --continue --progress=dot:giga) instead of one at a time. Missing files (e.g. ssb/taxi not yet uploaded) fail their own wget and are skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Use aria2c to split each file into parallel byte-range segments (-x16 -s16) and run several files at once (-j4), so the huge taxi file isn't one slow stream and small files don't wait behind it. Falls back to parallel wget if aria2c is absent; aria2 added to the cloud-init install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aria2 -j (multiple files at once) cross-contaminated per-file sizes (it used hits's length for ssb's byte ranges), so only hits downloaded and ssb/mgbench were aborted -> skipped at load time. Run one aria2c per file via xargs -P instead: each still uses 16 parallel segments, files download concurrently, and --allow-overwrite re-fetches a non-resumable pre-existing file rather than 416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Expand the Versions Benchmark to 9 datasets / 344 queries, each loaded into its own database so same-named tables never collide (e.g. TPC-H and TPC-DS both have `customer`): - TPC-H (SF40) and TPC-DS (SF32): official schemas/queries from the ClickHouse repo, Decimal->Float64, NULL->type defaults, synth_date for the legacy engine. - Coffee Shop (fact_sales_500m, minus the unused order_line_id column) from the published Iceberg tables. - ontime (12 used columns) and UK price-paid, from the docs' saved copies. - Join Order Benchmark (21 IMDB tables, 113 queries) with a CSV re-encoder. - Narrow taxi to the 5 columns its queries use. Also: per-dataset databases in run-version.sh/create.sh; default 6 tries (1 cold + 5 hot); dataset-qualified column schemas. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… timeout - run-version.sh builds clickhouse-built:* images on demand (ensure_built_image) when absent, using the recipe from versions.txt (build.sh) or monthly.tsv (Dockerfile.reconstruct) — nothing is pulled from a registry. - Load all datasets in parallel (one background job per dataset / database). - Per-query timeout (QUERY_TIMEOUT, default 100s): a query that exceeds it, or crashes the server, records null and skips its remaining tries (the server is revived after a crash so later queries still run). - ontime: sort the dump (Year, Month, FlightDate, ...) so INSERT blocks are date-contiguous and don't exceed max_partitions_per_insert_block (~450 months). - Reconstruct the build system for pre-2016-03 snapshots that lack one: Dockerfile.reconstruct + reconstruct.sh transplant the 2016-03 donor's build system + contrib, glob renamed sources, stub QuickLZ/MongoDB, generate re2_st, add an isnan shim; build-monthly.sh sweeps monthly.tsv. Strip the never-public add_subdirectory(private) from the 2016-06..08 tags in Dockerfile.ubuntu1604. - cloud-init: install docker-buildx; drop the now-redundant explicit build step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…2 builds - run-benchmark.sh: default volume 500 -> 1000 GB; parallel loading of all datasets peaks disk usage above the on-disk total (NOT_ENOUGH_SPACE otherwise). - reconstruct.sh: build the pre-2016-02 era with the old libstdc++ ABI (_GLIBCXX_USE_CXX11_ABI=0) — that era used the refcounted (COW) std::string, which the struct sizing assumes (Field's DBMS_TOTAL_FIELD_SIZE=32). Also: generic prune of donor-listed sources absent in an older target (encoding-safe), disable utils/, strip add_subdirectory(private), vendor Poco/Ext/ScopedTry.h. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

An aborted/incomplete INSERT (crash, OOM, disk full, interrupted stream) can leave a partially-loaded table. Drop it so the dataset's queries report null instead of timing against incomplete data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add two subobjects to results/<version>.json: - "load_time": {dataset: sum of its tables' load times in seconds} — accumulated per table during the (parallel, possibly separate) load phase into a stats file and summed per dataset at bench time. - "data_size": {dataset: on-disk bytes} — per-database sum(bytes_on_disk) from system.parts (each dataset is its own database), with a data-directory du fallback for old versions lacking that column. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2016-01 and older predate a big refactor and were built on trusty/gcc-5 with the old libstdc++ ABI (refcounted std::string). Reconstruct.sh + Dockerfile.reconstruct now build them end-to-end (verified: 2016-01 builds server+client and boots, SELECT version() -> 0.0.53400): - Base is now ubuntu:14.04 so gcc-5 defaults to the old ABI and the system boost is old-ABI too (16.04's new-ABI boost broke the client's boost::program_options). - Vendor the never-public Yandex libs from the donor: statdaemons embedded dictionaries (via DB/Dictionaries/Embedded) and the daemon base (a thin BaseDaemon compat carrying the used API + --config-file handling, avoiding the donor's newer zkutil/graphite deps); stub statdaemons/Interests.h. - Glob the whole dbms library (excluding the Server/Client executables + ODBC driver) so renamed/moved sources compile regardless of the donor's file lists. - Force-include <numeric>/<random> (not transitively available on this toolchain); build re2 then the server and client with single-target makes (avoid a recursive-make race on shared static libs); os.walk for Python 3.4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…s, robust revive Improvements to the versions-benchmark runner driven by inspecting the live results in sink.data: - Log the reason for every null: on a query failure run-version.sh now emits the server's error text (unsupported syntax/function), a timeout notice, or a crash notice — once per query, tagged with its "qN [dataset]" label. Errors were previously discarded silently. - Per-query minimum supported version (queries/<ds>.minver, aligned to <ds>.sql): "0" runs everywhere, a version is the first release known to run the query, "26.7" (future) means never seen to succeed. Below a query's minimum the runner records null without running it, saving time and avoiding crashes that would null later queries. Annotations computed from the 32 full runs. - Reorder QUERY_ORDER so the heavy, crash-/timeout-prone datasets (tpch, tpcds, job) run last: a late crash can no longer null earlier datasets as collateral (as happened to taxi on 20.4). - Best-effort revive_server: also relaunch the daemon in-place when the server process dies but the container stays up (previously a no-op for image providers), fall back to docker restart, use a longer timeout, and log container state/logs so a persistent failure is diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…brary wall) Extend the from-source reconstruction below 2015-12. At the 2015-11 -> 2015-12 boundary a coordinated refactor inlined several never-public external Yandex libraries and reshaped core containers; reproduce that so 2015-11 builds+boots (0.0.53350): - stats/*: append the templated intHash32<salt>+IntHash32 to the era's Hash.h (where 2015-12 moved it) and forward <stats/IntHash.h> there; overlay the UniquesHashSet / ReservoirSampler{,Deterministic} algorithms from a "last known-good month" PATCH_REF and forward the old <stats/...> paths to them. - ReservoirSampler: retarget its PODArray<Allocator<...>> buffer to std::vector to avoid back-porting the templated allocator through every container. - Strip illegal virt-specifiers from member-function templates (a modern gcc-5 error the era's compiler tolerated). - Extend the statdaemons compat auto-map to DB/Core (Exception.h etc. lived there in the oldest trees), make the MongoDB stub era-agnostic. - Dockerfile.reconstruct: PATCH_REF/PATCH_FILES overlay mechanism for the few gcc-hostile files fixed by 2015-12 (SummingSortedBlockInputStream, the stats algorithms). monthly-built.tsv records 2015-11 alongside 2015-12/2016-01/2016-02. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…add earlyoom Reconstruction — break the 2015-10 -> 2015-11 external-statdaemons boundary so 2015-10 builds and boots (0.0.53340). Before 2015-11 the codebase pulled the bulk of its infrastructure from the never-public external "statdaemons"/"stats" Yandex libraries; they were inlined in-repo at that boundary. Reproduce that: - Vendor DB::Exception + StackTrace + ErrnoException (the class itself was external here; DB/Core/Exception.h only #included it and added free functions). - Back-port the statdaemons infra from PATCH_REF (Stopwatch, ConfigProcessor, Pool{,WithFailover}Base, OptimizedRegularExpression(+.inl), HTMLForm, AIO, the HyperLogLog trio, NetException, Increment+CounterInFile, SimpleCache, threadpool, and the uniq/quantile algorithms), forwarding the old <statdaemons/*> (+ ext/*.hpp) include paths to them. - strconvert compat (escape.h escaped_for_like + hash64.h; MySQL dicts/OLAP are never exercised), Yandex/Revision.h stub, DB/Common/Exception.h and DB/Core/FieldVisitors.h forwards, ConnectionPoolWithFailover::getMany adapted to the newer 2-arg base (distributed plumbing, compile-only), BaseDaemon::sleep. - Overlay is now split so it can't downgrade newer, self-contained months: PATCH_FILES overwrites (only the gcc-hostile SummingSorted), PATCH_FILL adds the infra only when absent. monthly-built.tsv records 2015-10. Runner — skip loading a dataset when the version supports none of its queries: dataset_supported() checks the per-query .minver annotations; if every query is below the version's minimum (e.g. coffeeshop on 20.4), the dataset isn't loaded at all (its queries are recorded null anyway), saving load time and disk. Cloud-init — install and enable earlyoom (as the main ClickBench cloud-init does): loading all datasets in parallel (and heavy joins on old, less memory-efficient versions) can exhaust RAM; without earlyoom the kernel thrashes and the VM gets stuck. earlyoom kills the offender early; the server crash is then recovered by revive_server (or the query is recorded null). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… when launching run-benchmark.sh already retried run-instances on capacity/quota errors (as the main ClickBench launcher does), but a version sweep fires dozens of launches back-to-back and hits EC2 API throttling (RequestLimitExceeded / Throttling) far more than the single-shot main launcher. A throttled call was treated as a hard error and skipped that version — a likely cause of the large contiguous gaps in the runs (e.g. whole 23.x / 24.x missing). Add RequestLimitExceeded / Throttling to the retry set and factor the backoff into an aws_retry helper used for run-instances AND the pre-launch describe-instance-types / describe-images calls (a throttled describe would otherwise blank arch/ami and make the launch fail non-retryably). Genuine config errors still fail fast. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Rename the ancient ToMondayImpl helper date_lut.toWeek(...) to the donor's toFirstDayOfWeek in the DateLUT migration (a no-op on newer eras, which don't call it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files without a dotted version prefix (bare revision-number tag builds such as 53996) are renamed after the server-reported actual_version (1.1.53996), so every committed result file is named by a real version. Consecutive tag builds that report the same version are deduped, keeping the highest tag (the one matching the reported revision) and setting its version field to match. fetch-results.sh now does this normalization too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add narrow readVarUInt(UInt32&/UInt16&) overloads to VarInt.h when absent (pre-2012-07 declared only the UInt64 form, but the back-ported UniquesHashSet reads UInt32 fields directly). Read side only -- a narrow write overload would make int/enum callers ambiguous. A no-op on 2012-07+. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Display displayVersion(elem) on each version toggle, matching the chart and table; the full version is kept on the element's dataset for the selection lookup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Order the selector by the same key as the chart and table (actual_version, numeric-aware, label tie-break), latest first, so the three views stay in lockstep. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Create the known-good early (2012-06) DB/Core/StringRef.h when absent: pre-2012-06 predates StringRef entirely, but the overlaid donor libcommon JSON.h includes it. A no-op once the file exists (2012-06+). The 2012-05 client has no --query yet, so boot is verified via the native protocol handshake (server reports 0.0.28558). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Previously a missing result was filled with 2x the version's OWN worst query time, which differs between versions -- so a query neither of two versions ran (but a third selected version did) charged them unequal penalties and could flip their order (e.g. 19.17 vs 20.1 depending on whether 26.6 was shown). Use 2x the worst selected-run time for that particular query across the shown versions instead: the fill-in is then identical for every version, contributes equally, and cannot reorder others. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2012-01..2012-03 have no dbms/src/Server -- ClickHouse had no standalone server binary before 2012-04 (it was a library + test tools). There is nothing to run the benchmark against for those months, so the from-source reconstruction bottoms out at 2012-04. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…lection When a version has no successful (non-null) query run among the enabled queries, the chart now shows a skull (☠️) instead of a meaningless all-penalty ratio, paints its bar solid black, and excludes it from the bar scaling so it no longer inflates max_ratio and squashes the other bars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…t /// The // -style comment before the .summary-name a rule is invalid inside a <style> block; CSS error recovery swallowed the following rule, so the monospace font never applied. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…x/1000x) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Emit the from-source monthly reconstructions (2012-04 .. 2016-02) recorded in build-from-source/monthly-built.tsv, labeled by month with image clickhouse-built:<month> and dated from monthly.tsv. The pre-server months (2012-01..03) are skipped. These predate version_date.tsv, so they sort to the top of the chronological list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Empirically probed across the reconstructed images, MergeTree support splits into tiers: 2012-04..2012-12 MergeTree non-functional -> ENGINE = Log 2013-01..2014-03 MergeTree needs a mandatory -> MergeTree(date, <sample>, sampling expression (key,<sample>), 8192) 2014-04 and later plain positional MergeTree works -> MergeTree(date,(key),8192) Date-labeled monthly builds are classified by their YYYY-MM; dotted/tag versions keep the existing modern-vs-legacy logic. The sampling expression is a universal date-based one (intHash32 of the day number) since queries never actually SAMPLE. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Send DDL one statement at a time (run_ddl) instead of --multiquery, which the prehistoric clients do not support (they threw UnknownOptionException). * Replace EXISTS TABLE probes with a universal 'SELECT 1 FROM t LIMIT 0' (old versions lack EXISTS TABLE / SHOW TABLES / DESCRIBE / system.tables). * Do not benchmark a dataset, and report neither its load time nor its data size, unless every one of its tables loaded fully (dataset_fully_loaded): its queries record null. * data_size falls back to measuring the on-disk data directory when system.parts is unavailable (added ~2014-08; Log has no parts) -- verified the prehistoric data path is /opt/clickhouse/data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The oldest reconstructed clients support only basic options (no --time / --format / --max_memory_usage). Detect this once (detect via 'SELECT 1 --time') and, for such clients, run_query times the whole invocation end to end (date +%s.%N around the call), discards output to /dev/null, and omits --max_memory_usage. Modern clients keep the --time/--format=Null/--max_memory_usage path unchanged. Validated on 2013-06 (sampling MergeTree) and 2012-06 (Log): tables create, uk loads, load_time/data_size report correctly, and queries reach the server (nulling only on genuinely-absent functions like round(), as intended). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Describe the benchmark environment (c7a.4xlarge, 16 vCPU / 32 GB, 1000 GB gp3 EBS at 16000 IOPS / 1000 MB/s -- the run-benchmark.sh defaults) and each dataset with its scale (hits 100M, SSB SF100, TPC-H SF40, TPC-DS SF32, taxi ~1.3B, coffeeshop 500M fact, etc.). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A 100GB per-query limit on modern versions (whose default is 0/unlimited) defeats their disk spill and can skew or OOM the result. Query the server's effective default (system.settings) at bench time and pass --max_memory_usage only when it is exactly 10000000000 -- the value the old default-profile users.xml ships. Verified: 24.8 reports 0 (no override), 20.3 reports 10000000000 (override). Prehistoric basic clients never get the flag (unsupported) as before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Updated runs for 18.1/4/5/6/12/14/16.x, 19.1/3/4.x and 25.8/26.3/4/5.x (latest per version), regenerated data.generated.js. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

….3/4/5) Two bugs hid recently-run versions from the page: * generate-results.sh: jq '.release_date // "9999-99-99"' does not substitute an *empty* string (only null), so the line started with a tab; 'read -r _rd file' stripped that leading tab as IFS whitespace, shifting the filename into _rd and leaving file empty, so '[ -z file ] && continue' skipped the entry. Guard against an empty date explicitly. * fetch-results.sh: '.release_date // $rd' likewise kept an empty payload release_date instead of the resolved one. Prefer a non-empty payload date, else the looked-up date. Refreshed version_date.tsv and re-fetched: 25.8.24.21, 26.3.13.31, 26.4.3.37, 26.5.3.52 now carry real dates and appear on the page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tions run-benchmark.sh looked up the build recipe only in versions.txt and exited 'no build recipe' for the date-labeled monthly builds (e.g. 2013-03-01), which live in monthly.tsv -- so those versions failed before the AWS launch. Accept a monthly.tsv recipe too; run-version.sh's ensure_built_image reconstructs the image from the commit + revision on the VM (tag/gcc are unused for those). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The months before the first protocol-define anchor (2012-01 .. 2013-02) were all flat at the 28558 floor, so they tied on the page (all 0.0.28558) instead of ordering by date. Those sources have no DBMS_MIN_REVISION_WITH_* define, so their revision is unconstrained; interpolate it linearly by commit date from 0 at the earliest snapshot up to the first real anchor (29410 at 2013-03), giving distinct, monotonically-increasing revisions. The Dockerfile still clamps up to any real protocol floor, so protocol correctness is kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…chmark 'read -r tag gcc < <(awk ... versions.txt)' returns non-zero on EOF when the version is absent from versions.txt (every monthly reconstruction), and under set -e that aborted the script before the monthly.tsv fallback ran -- so old versions failed silently right after the versions.txt lookup. Resolve the recipe via command substitution and only read from a non-empty result; fall back to monthly.tsv otherwise. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 30 commits June 29, 2026 15:29

versions: default the benchmark VM to c7a.4xlarge

3f2121d

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: download data with wget --continue --progress=dot:giga

b9ba0e5

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: drop prepare-ebs-snapshot.sh

e5b4637

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 28 commits July 2, 2026 16:43

versions: use a monospace font for version names in the chart

466610f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2012-08 from source

bf5b4c0

Rename the ancient ToMondayImpl helper date_lut.toWeek(...) to the donor's toFirstDayOfWeek in the DateLUT migration (a no-op on newer eras, which don't call it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2012-07 from source (no new shims)

e9009ea

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: show short cal.ver (no patch) in the version selector too

4bd1ea2

Display displayVersion(elem) on each version toggle, matching the chart and table; the full version is kept on the element's dataset for the selection lookup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: sort the version selector by version number desc too

9b10960

Order the selector by the same key as the chart and table (actual_version, numeric-aware, label tie-break), latest first, so the three views stay in lockstep. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2012-04 from source (no new shims)

04a4a82

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: lower horizon-chart scales to 1x/4x/16x/64x (was 1x/10x/100…

f7f6cee

…x/1000x) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: set monospace for the whole tr.summary-row

d079c31

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: use x2 between horizon-chart scales (1x/2x/4x/8x)

1357e29

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: let the annotation span the full page width (drop max-width)

9c0a9c3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: refresh results from the sink

4ed203b

Updated runs for 18.1/4/5/6/12/14/16.x, 19.1/3/4.x and 25.8/26.3/4/5.x (latest per version), regenerated data.generated.js. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov self-assigned this Jul 2, 2026

alexey-milovidov merged commit 2d27c8b into main Jul 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Versions benchmark rework#968

Versions benchmark rework#968
alexey-milovidov merged 120 commits into
mainfrom
versions-benchmark-rework

alexey-milovidov commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alexey-milovidov commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant