Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
bc3887c
versions: rework the Versions Benchmark on Docker images
alexey-milovidov Jun 29, 2026
dfa3394
versions: resurrect the earliest releases by building from source
alexey-milovidov Jun 30, 2026
1223385
versions: fix DDL syntax selection for bare-number early tags
alexey-milovidov Jun 30, 2026
45ffbae
versions: add cloud-init + launchers for unattended per-version runs
alexey-milovidov Jun 30, 2026
3f2121d
versions: default the benchmark VM to c7a.4xlarge
alexey-milovidov Jun 30, 2026
b9ba0e5
versions: download data with wget --continue --progress=dot:giga
alexey-milovidov Jun 30, 2026
c6082b6
versions: detect server crash (OOM) mid-query and retry
alexey-milovidov Jun 30, 2026
4993f53
versions: accept an unambiguous version prefix in run-benchmark.sh
alexey-milovidov Jun 30, 2026
3c6d7d5
versions: resolve a version prefix to the latest match by version sort
alexey-milovidov Jun 30, 2026
4625981
versions: add standalone prepare-ebs-snapshot.sh for the datasets
alexey-milovidov Jun 30, 2026
e5b4637
versions: drop prepare-ebs-snapshot.sh
alexey-milovidov Jun 30, 2026
6fd6034
versions: verbose load logging, robust old-version client, fix query-…
alexey-milovidov Jun 30, 2026
78758b9
versions: stream per-query timings and cat the result JSON at the end
alexey-milovidov Jul 1, 2026
02b3420
versions: show INSERT progress with pv (once-a-second %/rate/ETA)
alexey-milovidov Jul 1, 2026
98f597d
versions: use real commit dates for built versions
alexey-milovidov Jul 1, 2026
c19d42f
versions: report per-table on-disk sizes at end of run
alexey-milovidov Jul 1, 2026
0aae31a
versions: gp3 root volume with provisioned throughput/IOPS
alexey-milovidov Jul 1, 2026
40e13f3
versions: download datasets in parallel
alexey-milovidov Jul 1, 2026
170be69
versions: parallelize single-file downloads with aria2c
alexey-milovidov Jul 1, 2026
cb8eeb5
versions: fix multi-file download — one aria2c per file
alexey-milovidov Jul 1, 2026
fc3d261
versions: add TPC-H, TPC-DS, Coffee Shop, ontime, UK, JOB datasets
alexey-milovidov Jul 1, 2026
8b1395f
versions: build unpublished versions on the fly; parallel load; query…
alexey-milovidov Jul 1, 2026
84a2683
versions: double default volume; extend reconstruct.sh for pre-2016-0…
alexey-milovidov Jul 1, 2026
46fbce0
versions: drop the table if its INSERT fails
alexey-milovidov Jul 1, 2026
1d3f4f3
versions: record per-dataset load_time and data_size in the result JSON
alexey-milovidov Jul 1, 2026
35468e2
versions: reconstruct pre-2016-02 snapshots (Ubuntu 14.04, old ABI)
alexey-milovidov Jul 1, 2026
a750817
versions: log query errors, skip unsupported queries, reorder dataset…
alexey-milovidov Jul 1, 2026
f122bcb
versions: reconstruct pre-2015-12 builds (break the external stats-li…
alexey-milovidov Jul 1, 2026
0c793a7
versions: reconstruct 2015-10, skip fully-unsupported dataset loads, …
alexey-milovidov Jul 1, 2026
afdfdae
versions: back off on all transient AWS errors (incl. API throttling)…
alexey-milovidov Jul 1, 2026
94fde03
versions: record reconstructed 2015-09 build
alexey-milovidov Jul 1, 2026
8c147b5
versions: reconstruct 2015-08 (creative float-uniq + HLL counter arg-…
alexey-milovidov Jul 1, 2026
8bb5ba4
versions: reconstruct 2015-07 (DateLUT split migration + assertChar +…
alexey-milovidov Jul 1, 2026
a4e5968
versions: reconstruct 2015-06 (vendor the old 2-arg PoolWithFailoverB…
alexey-milovidov Jul 1, 2026
8e2d013
versions: reconstruct 2015-03..2015-05 (flat pool priority + tryLogCu…
alexey-milovidov Jul 2, 2026
ccf0c80
versions: reconstruct 2015-01 and 2015-02 (ext::make_unique + Categor…
alexey-milovidov Jul 2, 2026
e7c042e
versions: reconstruct 2014-12 (RegionsNames::Language rename + intHas…
alexey-milovidov Jul 2, 2026
339f4d1
versions: reconstruct 2014-11 (conditional SummingSorted overlay + st…
alexey-milovidov Jul 2, 2026
628fbac
versions: reconstruct 2014-10 (ALWAYS_INLINE macro for back-ported Co…
alexey-milovidov Jul 2, 2026
7d9f5ae
versions: reconstruct 2014-09 (jsonxx compile-only stub)
alexey-milovidov Jul 2, 2026
52b89d2
versions: IPv4 listen fix -> 2014-11 boots & serves queries
alexey-milovidov Jul 2, 2026
b85dce6
versions: fix the compressed-protocol checksum error (real qlz header…
alexey-milovidov Jul 2, 2026
a55e557
versions: reconstruct 2014-07 and 2014-08 (DateLUTSingleton accessor)
alexey-milovidov Jul 2, 2026
a761198
versions: break the ZooKeeper-C++ structural wall -> 2014-06 reconstr…
alexey-milovidov Jul 2, 2026
aa69365
versions: reconstruct 2014-05 (Hash.h insert, UniquesHashSet<>, libmy…
alexey-milovidov Jul 2, 2026
c9022f3
versions: reconstruct 2014-04 from source
alexey-milovidov Jul 2, 2026
e650522
versions: reconstruct 2014-03 from source
alexey-milovidov Jul 2, 2026
02bab86
versions: reconstruct 2014-02 from source
alexey-milovidov Jul 2, 2026
47a276b
versions: reconstruct 2014-01 from source
alexey-milovidov Jul 2, 2026
a84036f
versions: reconstruct 2013-12 from source
alexey-milovidov Jul 2, 2026
71d2ad1
versions: reset all hits query minimum-version annotations to 0
alexey-milovidov Jul 2, 2026
6a88763
versions: add release_date and server-reported version to result JSON
alexey-milovidov Jul 2, 2026
7934388
versions: assign interpolated per-month revisions from protocol defines
alexey-milovidov Jul 2, 2026
fabe5a6
versions: delete pre-rework scripts/results; fetch results from the sink
alexey-milovidov Jul 2, 2026
f205fdc
versions: fix StoragePtr::operator bool for pre-2013-12 reconstruction
alexey-milovidov Jul 2, 2026
fa9ca45
versions: split page data into data.generated.js; per-dataset metrics…
alexey-milovidov Jul 2, 2026
8bfd03d
versions: page polish — horizon bar, date sort, per-dataset metric rows
alexey-milovidov Jul 2, 2026
72b9742
versions: bound load concurrency + retry to fix OOM-during-load; drop…
alexey-milovidov Jul 2, 2026
34266d6
versions: fix Cold/Hot Run selector not switching
alexey-milovidov Jul 2, 2026
a6d7325
versions: drop anomalous 1.1.54327 and 1.1.54310 (empty load)
alexey-milovidov Jul 2, 2026
2bd7fbc
versions: hide incomplete Total metrics; left-align dataset header names
alexey-milovidov Jul 2, 2026
66c1427
versions: port compact URL state and per-row delete button from main …
alexey-milovidov Jul 2, 2026
0520400
versions: data-size follows load, sticky header/left cols, color legend
alexey-milovidov Jul 2, 2026
f5ddd9d
versions: load/size metrics, calver group dates, sticky-to-edge, them…
alexey-milovidov Jul 2, 2026
7c59054
versions: make dataset separator name sticky; align all checkboxes in…
alexey-milovidov Jul 2, 2026
b79cd6a
versions: count only active parts in the data-size query
alexey-milovidov Jul 2, 2026
9ba3a78
versions: left-align column 1 so per-query checkboxes line up with da…
alexey-milovidov Jul 2, 2026
c50d9ea
versions: modest left indent for header/selectors/heading; narrow che…
alexey-milovidov Jul 2, 2026
73e928b
versions: fix query tooltip clipped at the left edge
alexey-milovidov Jul 2, 2026
9f9ccc5
versions: give the query tooltip an explicit (wider) width
alexey-milovidov Jul 2, 2026
20c6a53
versions: raise query tooltip above sticky header and separators
alexey-milovidov Jul 2, 2026
d572e2c
versions: number hits queries from 0, other datasets from 1
alexey-milovidov Jul 2, 2026
28204df
versions: reconstruct 2013-11 from source
alexey-milovidov Jul 2, 2026
1096658
versions: reconstruct 2013-10 from source (builds clean with current …
alexey-milovidov Jul 2, 2026
e10f14b
versions: reconstruct 2013-09 from source
alexey-milovidov Jul 2, 2026
7e4b8e3
versions: reconstruct 2013-08 from source
alexey-milovidov Jul 2, 2026
d164eb5
versions: reconstruct 2013-07 from source (builds clean with current …
alexey-milovidov Jul 2, 2026
d709e8f
versions: reconstruct 2013-06 from source
alexey-milovidov Jul 2, 2026
dde9812
versions: reconstruct 2013-05 from source
alexey-milovidov Jul 2, 2026
7724d33
versions: retry VolumeLimitExceeded in run-benchmark
alexey-milovidov Jul 2, 2026
7c86e3b
versions: reconstruct 2013-04 from source
alexey-milovidov Jul 2, 2026
a61fdf4
versions: reconstruct 2013-03 from source
alexey-milovidov Jul 2, 2026
f8820f5
versions: reconstruct 2013-02 from source (builds clean with current …
alexey-milovidov Jul 2, 2026
323967c
versions: reconstruct 2013-01 from source
alexey-milovidov Jul 2, 2026
c77860b
versions: refresh results from the sink (latest per version)
alexey-milovidov Jul 2, 2026
cac5e1b
versions: reconstruct 2012-12 from source
alexey-milovidov Jul 2, 2026
d5b18b4
versions: drop 18.10.3 again (anomalous result re-fetched from sink)
alexey-milovidov Jul 2, 2026
a65ec9e
versions: skip incomplete-load runs when fetching from the sink
alexey-milovidov Jul 2, 2026
dea4174
versions: reconstruct 2012-11 from source
alexey-milovidov Jul 2, 2026
ee3981f
versions: sort the page by version number, not by date
alexey-milovidov Jul 2, 2026
b5ef1b9
versions: reconstruct 2012-10 from source (no new shims)
alexey-milovidov Jul 2, 2026
95d3f23
versions: reconstruct 2012-09 from source (no new shims)
alexey-milovidov Jul 2, 2026
466610f
versions: use a monospace font for version names in the chart
alexey-milovidov Jul 2, 2026
bf5b4c0
versions: reconstruct 2012-08 from source
alexey-milovidov Jul 2, 2026
d20e9ff
versions: name revision-tag result files after their actual_version
alexey-milovidov Jul 2, 2026
e9009ea
versions: reconstruct 2012-07 from source (no new shims)
alexey-milovidov Jul 2, 2026
262409c
versions: reconstruct 2012-06 from source
alexey-milovidov Jul 2, 2026
4bd1ea2
versions: show short cal.ver (no patch) in the version selector too
alexey-milovidov Jul 2, 2026
9b10960
versions: sort the version selector by version number desc too
alexey-milovidov Jul 2, 2026
d0e2253
versions: reconstruct 2012-05 from source
alexey-milovidov Jul 2, 2026
aabf2b2
versions: penalize an absent result by 2x the worst time for THAT query
alexey-milovidov Jul 2, 2026
04a4a82
versions: reconstruct 2012-04 from source (no new shims)
alexey-milovidov Jul 2, 2026
9d5f7c8
versions: record 2012-04 as the reconstruction floor (first server)
alexey-milovidov Jul 2, 2026
12b126f
versions: show a skull + black bar for versions with no run in the se…
alexey-milovidov Jul 2, 2026
9bc0126
versions: fix monospace version names -- CSS comment must be /* */ no…
alexey-milovidov Jul 2, 2026
f7f6cee
versions: lower horizon-chart scales to 1x/4x/16x/64x (was 1x/10x/100…
alexey-milovidov Jul 2, 2026
d079c31
versions: set monospace for the whole tr.summary-row
alexey-milovidov Jul 2, 2026
1357e29
versions: use x2 between horizon-chart scales (1x/2x/4x/8x)
alexey-milovidov Jul 2, 2026
05d0e94
versions: list prehistoric monthly builds from list-versions.sh
alexey-milovidov Jul 2, 2026
aa4f822
versions: pick the table engine per prehistoric tier in create.sh
alexey-milovidov Jul 2, 2026
6f81f79
versions: adapt the loader for prehistoric versions
alexey-milovidov Jul 2, 2026
e96b51f
versions: basic-client query path for prehistoric versions
alexey-milovidov Jul 2, 2026
e6e6fb1
versions: add machine + datasets annotation at the end of the page
alexey-milovidov Jul 2, 2026
9c0a9c3
versions: let the annotation span the full page width (drop max-width)
alexey-milovidov Jul 2, 2026
0de3ba2
versions: only raise max_memory_usage where the default is 10GB
alexey-milovidov Jul 2, 2026
4ed203b
versions: refresh results from the sink
alexey-milovidov Jul 2, 2026
52d8b01
versions: stop dropping versions with an empty release_date (25.8, 26…
alexey-milovidov Jul 2, 2026
d160b3a
versions: let run-benchmark launch the prehistoric monthly reconstruc…
alexey-milovidov Jul 2, 2026
305b913
versions: interpolate pre-28558 monthly revisions from zero
alexey-milovidov Jul 2, 2026
6a0e9d1
versions: fix set -e abort resolving monthly build recipes in run-ben…
alexey-milovidov Jul 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 9 additions & 0 deletions versions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Prepared Native data files and cached Docker tag lists — large / regenerable.
prepare-data/data/

# Per-version build/run logs (artifacts)
build-from-source/logs/
logs/

# Rendered per-version cloud-init files
cloud-init.*.sh
213 changes: 210 additions & 3 deletions versions/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,214 @@
# ClickHouse Versions Benchmark

This is a benchmark for ClickHouse versions based on MgBench, Star Schema Benchmark and ClickBench.
This benchmark runs the **same** workload on the **same** data across every
historical and current ClickHouse version, to show how performance has evolved
over the years. It is published at https://benchmark.clickhouse.com/versions/
and described in the blog post
[ClickHouse Over the Years with Benchmarks](https://clickhouse.com/blog/clickhouse-over-the-years-with-benchmarks).

It is described [here](https://clickhouse.com/blog/clickhouse-over-the-years-with-benchmarks).
Please don't confuse it with the per-commit ClickHouse Performance Test, described
[here](https://clickhouse.com/blog/testing-the-performance-of-click-house).

Please don't be confused with the per-commit ClickHouse Performance Test, that is described [here](https://clickhouse.com/blog/testing-the-performance-of-click-house).
## How it works

Every ClickHouse release is published as a Docker image, so each version is run
in its own container — from `1.1.54xxx` (2018) to today — with no host install.

1. **`list-versions.sh`** — selects the versions to test and resolves an image
for each. Rules: keep **all** of the `1.1.x` family; for calendar-versioned
releases (18.x+) keep only the **latest patch within each major.minor**.
Historical images come from `yandex/clickhouse-server`; modern ones from
`clickhouse/clickhouse-server`. A version with no image falls back to
installing the `.deb`/`.tgz` from packages.clickhouse.com into Ubuntu.

2. **`prepare-data/prepare.sh`** — builds the canonical data files once, in the
**Native** format, using only the oldest-compatible types so a single set of
files loads into *every* version (validated against `1.1.54378`):
- `hits.native` — ClickBench `hits` (100M rows, 105 columns).
- `ssb.native` — Star Schema Benchmark `lineorder_flat` (scale factor 100).
- `mgbench{1,2,3}.native` — Brown benchmark `logs1`/`logs2`/`logs3`.
- `tpch_*.native` — TPC-H, 8 tables from the `dbgen` generator at scale
factor 40 (~10 GB compressed).
- `tpcds_*.native` — TPC-DS, 24 tables from the `dsdgen` generator at scale
factor 32 (~10 GB compressed).
- `coffeeshop_*.native` — Coffee Shop benchmark (`fact_sales` + `dim_locations`
+ `dim_products`), from the published Iceberg tables; the smallest fact
table (`fact_sales_500m`, 500M rows) is used, minus the unused
high-cardinality `order_line_id` column.
- `ontime.native` — airline on-time performance (single table), from the saved
copy in the public bucket, narrowed to the 12 columns its queries use.
- `uk_price_paid.native` — UK land registry "price paid" (single table, ~28M
rows / ~200 MB), preprocessed per the ClickHouse docs.
- `job_*.native` — Join Order Benchmark, 21 tables (a snapshot of IMDB) from
the canonical CSV dump.
- `taxi.native` — NYC `trips` (narrowed to the 5 columns its queries use).

Type downgrades: `LowCardinality`→`String`, `IPv4`→`String`,
`DateTime64`→`DateTime`, enums→`String`, TPC-H/TPC-DS `Decimal`→`Float64`
(TPC-H `CHAR(N)`→`FixedString(N)`; TPC-DS NULLs → type defaults so its
non-Nullable columns load); `Nullable` is kept only where the query set needs
`IS NULL` (mgbench `logs1`). Tables without a natural date carry a synthesised
`Date` column (`log_date` / TPC-H dimensions' constant `synth_date`) so the
legacy `MergeTree` engine works.

3. **`create/create.sh <version> <dataset> <table>`** — emits version-appropriate
DDL. Modern releases use `ENGINE = MergeTree PARTITION BY … ORDER BY …`; the
earliest `1.1.x` (before custom partitioning, < `1.1.54310`) use the legacy
positional `ENGINE = MergeTree(date, (key), 8192)`. Column lists live in
`create/schema/*.columns` (dataset-qualified, e.g. `tpcds_customer.columns`,
where a table name is shared across datasets).

4. **`run-version.sh <version> [image]`** — starts the server, creates each
dataset's tables **in its own database** (so same-named tables like TPC-H and
TPC-DS `customer` don't collide), loads each Native file with the simplest
possible `clickhouse-client INSERT … FORMAT Native`, then times every query in
`queries/{mgbench,ssb,hits,tpch,tpcds,coffeeshop,taxi}.sql` (`TRIES` runs each,
dropping the page cache between queries) and writes `results/<version>.json`.

5. **`run-all.sh`** — runs `run-version.sh` for every selected version.

6. **`generate-results.sh`** — folds `results/*.json` into `index.html`.

## Usage

```bash
# 1. Prepare the data once (full scale — reproduces the original benchmark).
# For a quick smoke test use a slice:
# HITS_PARTS=0 SSB_SCALE=1 TAXI_GLOB=trips_xaa.csv.gz ./prepare-data/prepare.sh
./prepare-data/prepare.sh

# 2. Benchmark one version, a few, or all of them.
./run-version.sh 1.1.54378
./run-all.sh 1.1.54378 19.6.3.18 24.8.1.1
./run-all.sh # every version from list-versions.sh

# 3. Regenerate the website.
./generate-results.sh
```

Requires Docker and a recent `clickhouse` binary (used only for data prep;
install with `curl https://clickhouse.com/ | sh`).

### Runtime and scale

At the original-blog scale, a single version takes on the order of **hours**
(measured ~4h on `1.1.54019`), dominated by loading the ~1.3B-row taxi table and
the cold first run of each query. The full ~143-version sweep is therefore a
**multi-week** job. To make it tractable, dial down the dominant dataset at prep
time, e.g. a ~100M-row taxi slice:

```bash
TAXI_GLOB='trips_xa[a-n].csv.gz' ./prepare-data/taxi.sh # ~14 of 175 files
```

Smaller `HITS_PARTS` / `SSB_SCALE` reduce the others similarly. The runner is
unchanged — only the prepared file sizes differ.

## Running in the cloud (unattended)

Like the main ClickBench, each version can be benchmarked on its own fresh VM
that self-terminates and sends its result to the sink:

```bash
./run-benchmark.sh 1.1.54378 # one version on a c7a.4xlarge
machine=c6a.metal ./run-benchmark.sh 24.8.1.1
datasets="hits ssb" ./run-benchmark.sh 25.1.1.1 # subset of datasets
./run-all-benchmarks.sh # one VM per runnable version
```

`run-benchmark.sh` resolves the version's image via `list-versions.sh`, renders
`cloud-init.sh.in`, and starts an EC2 instance (terminate-on-shutdown, capacity
retry). The VM installs Docker, downloads the prepared Native files from
`s3://clickhouse-public-datasets/versions-benchmark/*.native.zst`, builds the
image from source if the version has none (`clickhouse-built:*`, using the tag +
GCC from `build-from-source/versions.txt`), runs `run-version.sh`, and POSTs the
result JSON (enriched with the machine type, `kind:"versions-benchmark"`) plus
the log to `sink.data` on play.clickhouse.com. A server-side materialized view
turns those into the published report, exactly as the main benchmark does.

Notes: all datasets run by default (`datasets="hits ssb mgbench tpch tpcds
coffeeshop taxi"`); the taxi table is narrowed to the five columns its queries
use (~15 GB), so it no longer dominates. Pass a subset via `datasets=` to skip
some. While this branch is unmerged, pass `branch=versions-benchmark-rework`.
Missing dataset files in the bucket are skipped (their queries report null).

## Query set

344 queries in a fixed order: mgbench (15) + Star Schema Benchmark (13) +
ClickBench/hits (43) + TPC-H (22) + TPC-DS (103) + Coffee Shop (17) + ontime (11)
+ UK price-paid (3) + Join Order Benchmark (113) + taxi (4). See `queries/*.sql`.
The TPC-H, TPC-DS and JOB queries are the official sets taken
from the ClickHouse repository (`tests/benchmarks/tpc-{h,ds}/queries`), flattened
to one line each (TPC-H Q15 is rewritten from its `CREATE VIEW` form into a single
`WITH` query; TPC-DS two-part queries become two lines, giving 103 statements from
the 99 queries). Their many joins, subqueries and window functions only run on
modern versions — older releases report `null`. Results are reported one row per
query, with `null` for queries a given version cannot run.

The previous apt-based scripts are kept under `scripts/` and `unified_scripts/`
for reference.

## Old-version repair

Two fixes let the benchmark reach back to the very first published image
(`1.1.54019`, Sept 2016):

- **IPv4 listen override** (`config/listen.xml`, mounted into every image):
old images default to `<listen_host>::</listen_host>` (IPv6) and crash on
boot when the host has IPv6 disabled.
- **Sidecar client**: the oldest server images ship only `clickhouse-server`,
no client binary. The runner detects this and drives them with the
matching-version `yandex/clickhouse-client:<v>` image as a sidecar sharing
the server's network namespace — same native protocol, precise `--time`.

With these, `1.1.54019` runs 62 of the 75 queries; the 13 nulls are genuine
era limitations (e.g. `Nullable`, which mgbench `logs1`/`logs3` need, postdates
that build; likewise a few `toYYYYMM` / `replaceOne` / `COUNT(DISTINCT)` cases).

### Building the never-published versions from source

The earliest releases — the bare-number tags `53973`..`54011` and a few `1.1.x`
that were never pushed as an image or package (`1.1.54165`, `54318`, `54335`,
`54336`, `54358`, `54362`, `54370`) — are resurrected by compiling them from
source in their contemporary environment (`build-from-source/`):

```bash
cd build-from-source
./build.sh 1.1.54165 v1.1.54165-stable # one version -> clickhouse-built:1.1.54165
./build-all.sh # everything in versions.txt
```

`Dockerfile.ubuntu1604` pins the era toolchain (Ubuntu 16.04) and builds each
tag in a contemporary environment, packaging a runnable image
(`clickhouse-built:<v>`) with IPv4 listening, a `clickhouse` multi-call shim,
and the pre-created data dirs the 2016 server needs. `build-all.sh` runs several
builds concurrently (`JOBS`, default 6) since a single `make -j$(nproc)` doesn't
saturate the cores on these small codebases. `list-versions.sh` routes these
versions to their `clickhouse-built:<v>` image automatically.

What it took to make the old tree build on a modern host (encoded in the
Dockerfile and `versions.txt`):
- **Compiler escalates by era** — the required GCC is recorded per version in
`versions.txt` (4th column): gcc-5 for the 2016 tags, gcc-6 for `1.1.54318`,
gcc-7 for `1.1.54335`+ (pulled from the `ubuntu-toolchain-r` PPA via `ARG GCC`).
- **Strip `-Werror`** from the project's cmake (the tree hardcodes it and leaks
clang-only `-Wno-*` flags into the GCC build).
- **Submodules** — the 2016 tags vendor contrib (none); the later `1.1.x` use
submodules, one of which (`contrib/zookeeper`) points at a now-deleted repo, so
we init submodules tolerantly and let cmake fall back to system
`libzookeeper-mt-dev`.
- The slow apt layer is keyed only on the GCC version, so it is cached and shared
across all builds of the same compiler.

## Notes and limitations

- The 8 oldest builds (`1.1.54011`, `54165`, `54318`, `54335`, `54336`,
`54358`, `54362`, `54370`) were never published as an image or package, so
`list-versions.sh` lists them with the marker `unavailable` and the sweep
skips them. Everything from `1.1.54019` on is runnable.
- A version that fails to start, create a table, or load data is recorded as a
failure / `null` rows rather than aborting the sweep.
- Native files are stored zstd-compressed (level 6) and streamed through
`zstd -dc | clickhouse-client` at load time.
- Validated end-to-end on `1.1.54019` (oldest, via sidecar), `1.1.54378`
(legacy baseline), `19.8.3.8` (mid), and a modern `24.8` release.
Loading
Loading