Versions benchmark rework#968
Merged
Merged
Conversation
Rebuild the ClickHouse Versions Benchmark infrastructure from scratch around Docker images so every historical and current version can be run identically, replacing the old apt-based scripts. - list-versions.sh: select versions from the authoritative version_date.tsv (all 1.1.x + latest patch per YY.MM, 151 versions), resolve each to a yandex/clickhouse image, package, or unavailable (image-aware, handles 3- vs 4-component tag mismatches). - prepare-data/: build canonical Native data files for hits, SSB (SF100), mgbench (logs1/2/3) and NYC taxi using only oldest-compatible types (Nullable kept only where the queries need IS NULL); stored zstd-6 and streamed via `zstd -dc | clickhouse-client` at load time. - create/: per-version DDL (legacy MergeTree(date,(key),8192) for the earliest 1.1.x, modern PARTITION BY/ORDER BY otherwise) with column schemas under create/schema/. - run-version.sh / run-all.sh: provider abstraction (image + package-in- ubuntu fallback), IPv4 listen override and a matching-version sidecar client image to repair the oldest server images (back to 1.1.54019), plain INSERT ... FORMAT Native loading, 75-query set timed per dataset. Validated full-scale on 1.1.54019 (oldest) and 1.1.54378; data files are gitignored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add build-from-source/ to compile and run the ClickHouse versions that were never published as a Docker image or package — the bare-number early tags 53973..54011 and the 1.1.x releases 54165/54318/54335/54336/54358/54362/54370. - Dockerfile.ubuntu1604: build a tag in its contemporary environment (Ubuntu 16.04) into a runnable clickhouse-built:<v> image. Handles the era quirks: compiler escalates with date (gcc-5 -> 6 -> 7, the later two from the ubuntu-toolchain-r PPA via ARG GCC), strip the hardcoded -Werror, tolerant submodule init (contrib/zookeeper's upstream is gone -> cmake falls back to system libzookeeper-mt-dev), IPv4 listen, a clickhouse multi-call shim and the pre-created data dirs the 2016 server needs. - build.sh / build-all.sh: build one or many (JOBS concurrent — a single make -j$(nproc) doesn't saturate the cores on these small codebases). - versions.txt: the build list with tag, date and required GCC per version. - list-versions.sh: route these versions to their clickhouse-built:<v> image and order all 189 versions chronologically; nothing is "unavailable" anymore. - run-all.sh: load PARALLEL versions concurrently, then benchmark sequentially. - run-version.sh: LOAD_DATASETS lets a run skip a dataset's load (e.g. the huge taxi table) while its queries still run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
create.sh parsed bare build-number versions (e.g. "53982", the pre-1.1 early-release tags) as major=53982 >= 18 and emitted modern PARTITION BY / ORDER BY syntax, which the 2016 servers reject — so every table create failed and those versions produced all-null results. Treat a bare numeric version as an early build (custom partitioning landed at build 54310; all bare tags predate it), so they correctly get the legacy MergeTree(date,(key),8192) engine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the main ClickBench cloud flow for the Versions Benchmark: benchmark one version per fresh VM and send the result to the sink. - cloud-init.sh.in: install Docker, download the prepared Native files from s3://clickhouse-public-datasets/versions-benchmark/, build the image from source when the version has none (clickhouse-built:* via build-from-source), run run-version.sh, POST the result JSON (enriched with machine + kind) and the log to sink.data on play.clickhouse.com, then terminate. - run-benchmark.sh: resolve a version's image/tag/gcc and launch a VM (terminate-on-shutdown, capacity-retry), as the main launcher does. - run-all-benchmarks.sh: one VM per runnable version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
If clickhouse dies during a query (e.g. OOM-killed), the container exits but its data layer survives. Detect the dead server (SELECT 1 fails), revive it with docker start (relaunch the daemon for the package provider), and retry the query up to CRASH_RETRIES (default 2). This keeps one heavy query from nulling out every subsequent query for that version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e.g. '26.6' resolves to '26.6.1.1193' (one patch per YY.MM is kept), and the launcher canonicalises to the full version. Exact versions and bare tags still match directly; an ambiguous prefix lists the candidates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A prefix now picks the newest matching version instead of erroring on ambiguity: 24 -> 24.12.x, 1.1 -> the latest 1.1.x, 26.6 -> 26.6.1.1193. Exact versions and bare tags still match directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builds an EBS volume holding the prepared Native files (sized just-enough), snapshots it (labelled versions-data, tagged Name=clickbench-versions-data) and deletes the working volume. Standalone and not wired into the launcher: a snapshot-backed volume lazy-loads from S3, so for one-shot VMs it is not faster than the plain S3 download unless Fast Snapshot Restore or volume reuse is used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…loop stdin - load_data now echoes each CREATE (with DDL), the INSERT ... FORMAT Native and source file, and "loaded <table>: N rows in Ns" — so the cloud-init log shows what's happening during ingest. - Client invocations set HOME=/tmp (old images' clickhouse user has HOME /nonexistent -> history-file error) and TZ=UTC, and the sidecar client mounts the host /usr/share/zoneinfo (some old client images ship no tzdata and fail at startup with "Could not determine local time zone"). - Fix a stdin-drain regression from the crash-retry: the per-query `docker exec/run -i` client (and the SELECT 1 liveness probe) consumed the query file the benchmark loop reads on stdin, truncating each version to ~60/75 queries. Read queries on FD 3 and give the probe </dev/null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Echo 'qN [dataset]: [t1, t2, t3]' to the log as each query finishes, and cat results/<version>.json at the end so the full result is visible in the run output / cloud-init log (and thus received via the sink), not just written to a file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pipe the compressed file through pv before zstd -> INSERT, so loads report a periodic progress bar (percentage, rate, ETA) based on the known file size. Falls back to cat when pv is absent; pv added to the cloud-init apt install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fetch the actual git commit date for every from-source version (the bare tags 53973..54011 had bogus 2016-01-01 fallbacks from an earlier rate-limited fetch) and record it in versions.txt; list-versions.sh now reports that commit date for built versions instead of the version_date.tsv release date. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Print 'table <TAB> bytes' from system.parts (database 'default'), falling back for old versions without it to du -sLb on the data dir (/var/lib/clickhouse or /opt/clickhouse), following symlinks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cold-cache query reads and ingest are disk-bound, so use gp3 (default 1000 MB/s / 16000 IOPS, both overridable via throughput=/iops=) instead of gp2 whose throughput is tied to size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Build the list of dataset files and fetch them concurrently (xargs -P 8 wget --continue --progress=dot:giga) instead of one at a time. Missing files (e.g. ssb/taxi not yet uploaded) fail their own wget and are skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Use aria2c to split each file into parallel byte-range segments (-x16 -s16) and run several files at once (-j4), so the huge taxi file isn't one slow stream and small files don't wait behind it. Falls back to parallel wget if aria2c is absent; aria2 added to the cloud-init install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aria2 -j (multiple files at once) cross-contaminated per-file sizes (it used hits's length for ssb's byte ranges), so only hits downloaded and ssb/mgbench were aborted -> skipped at load time. Run one aria2c per file via xargs -P instead: each still uses 16 parallel segments, files download concurrently, and --allow-overwrite re-fetches a non-resumable pre-existing file rather than 416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expand the Versions Benchmark to 9 datasets / 344 queries, each loaded into its own database so same-named tables never collide (e.g. TPC-H and TPC-DS both have `customer`): - TPC-H (SF40) and TPC-DS (SF32): official schemas/queries from the ClickHouse repo, Decimal->Float64, NULL->type defaults, synth_date for the legacy engine. - Coffee Shop (fact_sales_500m, minus the unused order_line_id column) from the published Iceberg tables. - ontime (12 used columns) and UK price-paid, from the docs' saved copies. - Join Order Benchmark (21 IMDB tables, 113 queries) with a CSV re-encoder. - Narrow taxi to the 5 columns its queries use. Also: per-dataset databases in run-version.sh/create.sh; default 6 tries (1 cold + 5 hot); dataset-qualified column schemas. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… timeout - run-version.sh builds clickhouse-built:* images on demand (ensure_built_image) when absent, using the recipe from versions.txt (build.sh) or monthly.tsv (Dockerfile.reconstruct) — nothing is pulled from a registry. - Load all datasets in parallel (one background job per dataset / database). - Per-query timeout (QUERY_TIMEOUT, default 100s): a query that exceeds it, or crashes the server, records null and skips its remaining tries (the server is revived after a crash so later queries still run). - ontime: sort the dump (Year, Month, FlightDate, ...) so INSERT blocks are date-contiguous and don't exceed max_partitions_per_insert_block (~450 months). - Reconstruct the build system for pre-2016-03 snapshots that lack one: Dockerfile.reconstruct + reconstruct.sh transplant the 2016-03 donor's build system + contrib, glob renamed sources, stub QuickLZ/MongoDB, generate re2_st, add an isnan shim; build-monthly.sh sweeps monthly.tsv. Strip the never-public add_subdirectory(private) from the 2016-06..08 tags in Dockerfile.ubuntu1604. - cloud-init: install docker-buildx; drop the now-redundant explicit build step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…2 builds - run-benchmark.sh: default volume 500 -> 1000 GB; parallel loading of all datasets peaks disk usage above the on-disk total (NOT_ENOUGH_SPACE otherwise). - reconstruct.sh: build the pre-2016-02 era with the old libstdc++ ABI (_GLIBCXX_USE_CXX11_ABI=0) — that era used the refcounted (COW) std::string, which the struct sizing assumes (Field's DBMS_TOTAL_FIELD_SIZE=32). Also: generic prune of donor-listed sources absent in an older target (encoding-safe), disable utils/, strip add_subdirectory(private), vendor Poco/Ext/ScopedTry.h. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An aborted/incomplete INSERT (crash, OOM, disk full, interrupted stream) can leave a partially-loaded table. Drop it so the dataset's queries report null instead of timing against incomplete data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add two subobjects to results/<version>.json:
- "load_time": {dataset: sum of its tables' load times in seconds} — accumulated
per table during the (parallel, possibly separate) load phase into a stats
file and summed per dataset at bench time.
- "data_size": {dataset: on-disk bytes} — per-database sum(bytes_on_disk) from
system.parts (each dataset is its own database), with a data-directory du
fallback for old versions lacking that column.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2016-01 and older predate a big refactor and were built on trusty/gcc-5 with the old libstdc++ ABI (refcounted std::string). Reconstruct.sh + Dockerfile.reconstruct now build them end-to-end (verified: 2016-01 builds server+client and boots, SELECT version() -> 0.0.53400): - Base is now ubuntu:14.04 so gcc-5 defaults to the old ABI and the system boost is old-ABI too (16.04's new-ABI boost broke the client's boost::program_options). - Vendor the never-public Yandex libs from the donor: statdaemons embedded dictionaries (via DB/Dictionaries/Embedded) and the daemon base (a thin BaseDaemon compat carrying the used API + --config-file handling, avoiding the donor's newer zkutil/graphite deps); stub statdaemons/Interests.h. - Glob the whole dbms library (excluding the Server/Client executables + ODBC driver) so renamed/moved sources compile regardless of the donor's file lists. - Force-include <numeric>/<random> (not transitively available on this toolchain); build re2 then the server and client with single-target makes (avoid a recursive-make race on shared static libs); os.walk for Python 3.4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s, robust revive Improvements to the versions-benchmark runner driven by inspecting the live results in sink.data: - Log the reason for every null: on a query failure run-version.sh now emits the server's error text (unsupported syntax/function), a timeout notice, or a crash notice — once per query, tagged with its "qN [dataset]" label. Errors were previously discarded silently. - Per-query minimum supported version (queries/<ds>.minver, aligned to <ds>.sql): "0" runs everywhere, a version is the first release known to run the query, "26.7" (future) means never seen to succeed. Below a query's minimum the runner records null without running it, saving time and avoiding crashes that would null later queries. Annotations computed from the 32 full runs. - Reorder QUERY_ORDER so the heavy, crash-/timeout-prone datasets (tpch, tpcds, job) run last: a late crash can no longer null earlier datasets as collateral (as happened to taxi on 20.4). - Best-effort revive_server: also relaunch the daemon in-place when the server process dies but the container stays up (previously a no-op for image providers), fall back to docker restart, use a longer timeout, and log container state/logs so a persistent failure is diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…brary wall)
Extend the from-source reconstruction below 2015-12. At the 2015-11 -> 2015-12
boundary a coordinated refactor inlined several never-public external Yandex
libraries and reshaped core containers; reproduce that so 2015-11 builds+boots
(0.0.53350):
- stats/*: append the templated intHash32<salt>+IntHash32 to the era's Hash.h
(where 2015-12 moved it) and forward <stats/IntHash.h> there; overlay the
UniquesHashSet / ReservoirSampler{,Deterministic} algorithms from a "last
known-good month" PATCH_REF and forward the old <stats/...> paths to them.
- ReservoirSampler: retarget its PODArray<Allocator<...>> buffer to std::vector
to avoid back-porting the templated allocator through every container.
- Strip illegal virt-specifiers from member-function templates (a modern gcc-5
error the era's compiler tolerated).
- Extend the statdaemons compat auto-map to DB/Core (Exception.h etc. lived
there in the oldest trees), make the MongoDB stub era-agnostic.
- Dockerfile.reconstruct: PATCH_REF/PATCH_FILES overlay mechanism for the few
gcc-hostile files fixed by 2015-12 (SummingSortedBlockInputStream, the stats
algorithms).
monthly-built.tsv records 2015-11 alongside 2015-12/2016-01/2016-02.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…add earlyoom
Reconstruction — break the 2015-10 -> 2015-11 external-statdaemons boundary so
2015-10 builds and boots (0.0.53340). Before 2015-11 the codebase pulled the bulk
of its infrastructure from the never-public external "statdaemons"/"stats" Yandex
libraries; they were inlined in-repo at that boundary. Reproduce that:
- Vendor DB::Exception + StackTrace + ErrnoException (the class itself was
external here; DB/Core/Exception.h only #included it and added free functions).
- Back-port the statdaemons infra from PATCH_REF (Stopwatch, ConfigProcessor,
Pool{,WithFailover}Base, OptimizedRegularExpression(+.inl), HTMLForm, AIO, the
HyperLogLog trio, NetException, Increment+CounterInFile, SimpleCache,
threadpool, and the uniq/quantile algorithms), forwarding the old
<statdaemons/*> (+ ext/*.hpp) include paths to them.
- strconvert compat (escape.h escaped_for_like + hash64.h; MySQL dicts/OLAP are
never exercised), Yandex/Revision.h stub, DB/Common/Exception.h and
DB/Core/FieldVisitors.h forwards, ConnectionPoolWithFailover::getMany adapted
to the newer 2-arg base (distributed plumbing, compile-only), BaseDaemon::sleep.
- Overlay is now split so it can't downgrade newer, self-contained months:
PATCH_FILES overwrites (only the gcc-hostile SummingSorted), PATCH_FILL adds
the infra only when absent. monthly-built.tsv records 2015-10.
Runner — skip loading a dataset when the version supports none of its queries:
dataset_supported() checks the per-query .minver annotations; if every query is
below the version's minimum (e.g. coffeeshop on 20.4), the dataset isn't loaded at
all (its queries are recorded null anyway), saving load time and disk.
Cloud-init — install and enable earlyoom (as the main ClickBench cloud-init does):
loading all datasets in parallel (and heavy joins on old, less memory-efficient
versions) can exhaust RAM; without earlyoom the kernel thrashes and the VM gets
stuck. earlyoom kills the offender early; the server crash is then recovered by
revive_server (or the query is recorded null).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… when launching run-benchmark.sh already retried run-instances on capacity/quota errors (as the main ClickBench launcher does), but a version sweep fires dozens of launches back-to-back and hits EC2 API throttling (RequestLimitExceeded / Throttling) far more than the single-shot main launcher. A throttled call was treated as a hard error and skipped that version — a likely cause of the large contiguous gaps in the runs (e.g. whole 23.x / 24.x missing). Add RequestLimitExceeded / Throttling to the retry set and factor the backoff into an aws_retry helper used for run-instances AND the pre-launch describe-instance-types / describe-images calls (a throttled describe would otherwise blank arch/ami and make the launch fail non-retryably). Genuine config errors still fail fast. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rename the ancient ToMondayImpl helper date_lut.toWeek(...) to the donor's toFirstDayOfWeek in the DateLUT migration (a no-op on newer eras, which don't call it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Files without a dotted version prefix (bare revision-number tag builds such as 53996) are renamed after the server-reported actual_version (1.1.53996), so every committed result file is named by a real version. Consecutive tag builds that report the same version are deduped, keeping the highest tag (the one matching the reported revision) and setting its version field to match. fetch-results.sh now does this normalization too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add narrow readVarUInt(UInt32&/UInt16&) overloads to VarInt.h when absent (pre-2012-07 declared only the UInt64 form, but the back-ported UniquesHashSet reads UInt32 fields directly). Read side only -- a narrow write overload would make int/enum callers ambiguous. A no-op on 2012-07+. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Display displayVersion(elem) on each version toggle, matching the chart and table; the full version is kept on the element's dataset for the selection lookup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Order the selector by the same key as the chart and table (actual_version, numeric-aware, label tie-break), latest first, so the three views stay in lockstep. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Create the known-good early (2012-06) DB/Core/StringRef.h when absent: pre-2012-06 predates StringRef entirely, but the overlaid donor libcommon JSON.h includes it. A no-op once the file exists (2012-06+). The 2012-05 client has no --query yet, so boot is verified via the native protocol handshake (server reports 0.0.28558). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Previously a missing result was filled with 2x the version's OWN worst query time, which differs between versions -- so a query neither of two versions ran (but a third selected version did) charged them unequal penalties and could flip their order (e.g. 19.17 vs 20.1 depending on whether 26.6 was shown). Use 2x the worst selected-run time for that particular query across the shown versions instead: the fill-in is then identical for every version, contributes equally, and cannot reorder others. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2012-01..2012-03 have no dbms/src/Server -- ClickHouse had no standalone server binary before 2012-04 (it was a library + test tools). There is nothing to run the benchmark against for those months, so the from-source reconstruction bottoms out at 2012-04. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lection When a version has no successful (non-null) query run among the enabled queries, the chart now shows a skull (☠️) instead of a meaningless all-penalty ratio, paints its bar solid black, and excludes it from the bar scaling so it no longer inflates max_ratio and squashes the other bars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t /// The // -style comment before the .summary-name a rule is invalid inside a <style> block; CSS error recovery swallowed the following rule, so the monospace font never applied. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…x/1000x) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Emit the from-source monthly reconstructions (2012-04 .. 2016-02) recorded in build-from-source/monthly-built.tsv, labeled by month with image clickhouse-built:<month> and dated from monthly.tsv. The pre-server months (2012-01..03) are skipped. These predate version_date.tsv, so they sort to the top of the chronological list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Empirically probed across the reconstructed images, MergeTree support splits into tiers:
2012-04..2012-12 MergeTree non-functional -> ENGINE = Log
2013-01..2014-03 MergeTree needs a mandatory -> MergeTree(date, <sample>,
sampling expression (key,<sample>), 8192)
2014-04 and later plain positional MergeTree works -> MergeTree(date,(key),8192)
Date-labeled monthly builds are classified by their YYYY-MM; dotted/tag versions keep the
existing modern-vs-legacy logic. The sampling expression is a universal date-based one
(intHash32 of the day number) since queries never actually SAMPLE.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Send DDL one statement at a time (run_ddl) instead of --multiquery, which the prehistoric clients do not support (they threw UnknownOptionException). * Replace EXISTS TABLE probes with a universal 'SELECT 1 FROM t LIMIT 0' (old versions lack EXISTS TABLE / SHOW TABLES / DESCRIBE / system.tables). * Do not benchmark a dataset, and report neither its load time nor its data size, unless every one of its tables loaded fully (dataset_fully_loaded): its queries record null. * data_size falls back to measuring the on-disk data directory when system.parts is unavailable (added ~2014-08; Log has no parts) -- verified the prehistoric data path is /opt/clickhouse/data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The oldest reconstructed clients support only basic options (no --time / --format / --max_memory_usage). Detect this once (detect via 'SELECT 1 --time') and, for such clients, run_query times the whole invocation end to end (date +%s.%N around the call), discards output to /dev/null, and omits --max_memory_usage. Modern clients keep the --time/--format=Null/--max_memory_usage path unchanged. Validated on 2013-06 (sampling MergeTree) and 2012-06 (Log): tables create, uk loads, load_time/data_size report correctly, and queries reach the server (nulling only on genuinely-absent functions like round(), as intended). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Describe the benchmark environment (c7a.4xlarge, 16 vCPU / 32 GB, 1000 GB gp3 EBS at 16000 IOPS / 1000 MB/s -- the run-benchmark.sh defaults) and each dataset with its scale (hits 100M, SSB SF100, TPC-H SF40, TPC-DS SF32, taxi ~1.3B, coffeeshop 500M fact, etc.). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 100GB per-query limit on modern versions (whose default is 0/unlimited) defeats their disk spill and can skew or OOM the result. Query the server's effective default (system.settings) at bench time and pass --max_memory_usage only when it is exactly 10000000000 -- the value the old default-profile users.xml ships. Verified: 24.8 reports 0 (no override), 20.3 reports 10000000000 (override). Prehistoric basic clients never get the flag (unsupported) as before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Updated runs for 18.1/4/5/6/12/14/16.x, 19.1/3/4.x and 25.8/26.3/4/5.x (latest per version), regenerated data.generated.js. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….3/4/5) Two bugs hid recently-run versions from the page: * generate-results.sh: jq '.release_date // "9999-99-99"' does not substitute an *empty* string (only null), so the line started with a tab; 'read -r _rd file' stripped that leading tab as IFS whitespace, shifting the filename into _rd and leaving file empty, so '[ -z file ] && continue' skipped the entry. Guard against an empty date explicitly. * fetch-results.sh: '.release_date // $rd' likewise kept an empty payload release_date instead of the resolved one. Prefer a non-empty payload date, else the looked-up date. Refreshed version_date.tsv and re-fetched: 25.8.24.21, 26.3.13.31, 26.4.3.37, 26.5.3.52 now carry real dates and appear on the page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tions run-benchmark.sh looked up the build recipe only in versions.txt and exited 'no build recipe' for the date-labeled monthly builds (e.g. 2013-03-01), which live in monthly.tsv -- so those versions failed before the AWS launch. Accept a monthly.tsv recipe too; run-version.sh's ensure_built_image reconstructs the image from the commit + revision on the VM (tag/gcc are unused for those). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The months before the first protocol-define anchor (2012-01 .. 2013-02) were all flat at the 28558 floor, so they tied on the page (all 0.0.28558) instead of ordering by date. Those sources have no DBMS_MIN_REVISION_WITH_* define, so their revision is unconstrained; interpolate it linearly by commit date from 0 at the earliest snapshot up to the first real anchor (29410 at 2013-03), giving distinct, monotonically-increasing revisions. The Dockerfile still clamps up to any real protocol floor, so protocol correctness is kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…chmark 'read -r tag gcc < <(awk ... versions.txt)' returns non-zero on EOF when the version is absent from versions.txt (every monthly reconstruction), and under set -e that aborted the script before the monthly.tsv fallback ran -- so old versions failed silently right after the versions.txt lookup. Resolve the recipe via command substitution and only read from a non-empty result; fall back to monthly.tsv otherwise. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.