Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Documentation/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
TECH_DOCS += technical/paint-down-to-common
TECH_DOCS += technical/parallel-checkout
TECH_DOCS += technical/partial-clone
TECH_DOCS += technical/platform-support
Expand Down
1 change: 1 addition & 0 deletions Documentation/technical/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ articles = [
'multi-pack-index.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
'paint-down-to-common.adoc',
'parallel-checkout.adoc',
'partial-clone.adoc',
'platform-support.adoc',
Expand Down
151 changes: 151 additions & 0 deletions Documentation/technical/paint-down-to-common.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
Merge-Base Computation and paint_down_to_common()
==================================================

The function `paint_down_to_common()` in `commit-reach.c` computes merge
bases by walking the commit graph backwards from two sets of tips and
finding where their ancestry meets.

Use cases
---------

Computing merge bases is used in two different ways:

1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
`merge`, `rebase`). A merge base is a common ancestor that is
not itself an ancestor of another common ancestor.

2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
--is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
an ancestor of commit B?" If a common ancestor equals one of the
inputs, that input is necessarily the only merge base -- no other
common ancestor can be both as recent and not an ancestor of it.

Both use cases share the same algorithm and implementation.

Algorithm
---------

Given a commit `one` and a set of commits `twos[]`, the walk paints
commits with two colors:

- PARENT1: reachable from `one`
- PARENT2: reachable from any commit in `twos[]`

The walk uses a priority queue ordered by generation number
(highest first), breaking ties by commit date. Each step dequeues
the highest-priority commit (this is when we say a commit is
"visited") and propagates its paint flags to its parents, enqueuing
them if they gained new flags. When a commit receives both PARENT1
and PARENT2, it is a merge-base candidate. A candidate gains the
STALE flag so its ancestors propagate staleness -- any deeper common
ancestor is necessarily redundant.

[[generation-regions]]
INFINITY and finite generation regions
--------------------------------------

The commit-graph stores a generation number for each commit. Commits
not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
graph is closed under reachability: if a commit is in the graph, all
its ancestors are too. This partitions the commit graph into two regions:

....
+---------------------------------------+
| INFINITY region |
| generation = INFINITY |
| queue order: heuristic (commit date) |
+---------------------------------------+
|
v
+---------------------------------------+
| Finite region |
| generation = finite |
| queue order: topological |
+---------------------------------------+
....

When the commit-graph is enabled, the INFINITY region is typically
very small -- it only contains commits added since the last
commit-graph refresh.

All reachable INFINITY-generation commits are visited before any
finite-generation commit, because INFINITY is larger than any finite
value. Once the walk crosses into the finite region, it stays there.

In the finite region, generation ordering guarantees topological
traversal: children are always visited before their parents. This
means that paint on already-visited commits is final -- no future
traversal step can add paint to them.

In the INFINITY region, commit-date ordering can violate this: a
parent with a later date can be visited before a child with an earlier
date. Paint flags are therefore NOT final at visit time, and a
commit visited with only one side's paint may later gain the other.

Paint flags are only added, never removed. Since each flag can be set
at most once per commit, the number of times a commit can be
re-enqueued is bounded by the number of flag transitions.

Termination
-----------

The walk tracks the number of commits of each type in the queue
(PARENT1-only, PARENT2-only, pending merge-base). The main loop
ends when one of the following conditions holds:

1. The queue is empty.
2. The queue contains only stale entries.
3. Generation cutoff: the dequeued commit's generation is below
a caller-supplied `min_generation` threshold.
4. Single result: the caller only needs one merge base, one has
been found, and the walk has entered the finite-generation
region.
5. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
remain in the queue, no pending merge-base candidates exist,
and the walk has entered the finite-generation region.

Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
Once all queued entries are stale, no new merge-base candidates can
be discovered -- that requires at least one non-stale commit from
each side meeting. Continuing the walk could still invalidate
existing candidates by proving one is an ancestor of another, but
`remove_redundant()` handles that as a post-processing step, so it
is safe to exit early.

Side-exhaustion condition
~~~~~~~~~~~~~~~~~~~~~~~~~
A new merge-base requires commits from both sides to meet. When one
side's exclusive counter reaches zero and there are no pending
merge-base candidates, no future traversal step can produce a new
candidate.

This optimization only activates in the finite-generation region
where topological ordering holds. In that region, children are
always visited before parents, so paint flags are final at visit
time and an exhausted side cannot reappear. In the INFINITY region,
commit-date ordering can violate this guarantee, so the check is
skipped.

Generation cutoff
~~~~~~~~~~~~~~~~~
Some callers (notably `remove_redundant()`) supply a `min_generation`
threshold -- the minimum generation of the input commits. No merge
base can have a generation below this threshold, so the walk
terminates as soon as it dequeues such a commit.

Single result
~~~~~~~~~~~~~
When only one merge base is needed, the walk is in the
finite-generation region, and the queue uses generation ordering,
the first candidate found is necessarily the highest-generation
common ancestor. No remaining commit in the queue can be a
descendant of this candidate (generation ordering guarantees
children are visited first), so it cannot be redundant and the walk
can stop immediately.

Related documentation
---------------------

- `Documentation/technical/commit-graph.adoc` -- generation numbers
and the reachability closure property.
11 changes: 0 additions & 11 deletions commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -793,17 +793,6 @@ int generation_numbers_enabled(struct repository *r)
return !!first_generation;
}

int corrected_commit_dates_enabled(struct repository *r)
{
struct commit_graph *g;

g = prepare_commit_graph(r);
if (!g || !g->num_commits)
return 0;

return g->read_generation_data;
}

struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
{
struct commit_graph *g;
Expand Down
6 changes: 0 additions & 6 deletions commit-graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,12 +136,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
*/
int generation_numbers_enabled(struct repository *r);

/*
* Return 1 if and only if the repository has a commit-graph
* file and generation data chunk has been written for the file.
*/
int corrected_commit_dates_enabled(struct repository *r);

struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r);

enum commit_graph_write_flags {
Expand Down
Loading
Loading