Skip to content

feat: Improve startup times by setting flow.election.max.candidates#953

Open
sbernauer wants to merge 3 commits into
mainfrom
feat/improve-leader-election
Open

feat: Improve startup times by setting flow.election.max.candidates#953
sbernauer wants to merge 3 commits into
mainfrom
feat/improve-leader-election

Conversation

@sbernauer

@sbernauer sbernauer commented Jul 1, 2026

Copy link
Copy Markdown
Member

Description

TLDR: In case the user specifies a fixed number of NiFi nodes (i.e. no auto-scaling), set nifi.cluster.flow.election.max.candidates to that number. This results in much faster NiFi startups, as it doesn't need to wait for the 5 minutes of nifi.cluster.flow.election.max.wait.time

(disclaimer: AI helped my on the description, code is mine)

NiFi uses flow election on cold start: nodes vote on which flow definition wins, and the cluster won't finish coming up until election settles. Two things gate that:

  • nifi.cluster.flow.election.max.wait.time — the timeout the cluster waits before electing among whoever has shown up.
  • nifi.cluster.flow.election.max.candidates — a shortcut: elect immediately once this many nodes have connected and voted, without waiting out the timeout.

In #936 we correctly raised max.wait.time back to NiFi's upstream default of 5 minutes (the previous 1 min was a leftover "for testing" value that risked electing on incomplete vote sets). That fixed the correctness problem, but it left every cold start paying up to 5 minutes before the cluster is usable — because we were leaving max.candidates empty, so the timeout was the only thing that ended election.

The pain is real and already visible: many of our own kuttl tests, as well as customers, have been reaching for configOverrides to lower max.wait.time again simply because a 5-minute startup is too slow to live with.

This PR resolves the tension instead of trading one problem for the other. When the number of NiFi nodes is fixed (all role-group replicas are set, i.e. no autoscaling), the operator knows the exact node count and sets max.candidates to it. On cold start the cluster now elects the moment all expected nodes have reported in — typically seconds — while max.wait.time stays at the safe 5-minute upstream default as a fallback for the degraded case (a node that never joins).

The result:

  • Fast startups without lowering the safety timeout.
  • No correctness regression — if fewer nodes than expected show up, behavior is identical to before (wait out the timeout, then elect).
  • No more manual max.wait.time overrides in tests and customer clusters.

This is safe on our StatefulSet setup specifically because pods use podManagementPolicy: Parallel, so all expected nodes start concurrently and the "elect once all candidates are present" fast path can actually trigger. When replicas is left unset anywhere (autoscaling), we fall back to the previous empty value.

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

Release notes

For clusters with a fixed number of nodes, the operator now sets nifi.cluster.flow.election.max.candidates to the total node count, so cold-start flow election completes as soon as all expected nodes report in — typically seconds instead of the up-to-5-minute max.wait.time timeout. The timeout stays at NiFi's upstream default of 5 minutes as a fallback, so there's no correctness change if fewer nodes than expected show up.

@sbernauer sbernauer self-assigned this Jul 1, 2026
@sbernauer

Copy link
Copy Markdown
Member Author

@sbernauer

Copy link
Copy Markdown
Member Author

@sbernauer sbernauer moved this to Development: Waiting for Review in Stackable Engineering Jul 1, 2026

@NickLarsenNZ NickLarsenNZ left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NickLarsenNZ NickLarsenNZ moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Jul 1, 2026
PROTOCOL_PORT.to_string(),
);

// In case the number of NiFi nodes is hard-coded to a fixed (no auto-scaling), we can tell

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// In case the number of NiFi nodes is hard-coded to a fixed (no auto-scaling), we can tell
// In case the number of NiFi nodes is hard-coded to a fixed number (no auto-scaling), we can tell

///
/// This is the case when all `replicas` are set to [`Some<u16>`], in which case they are simply
/// summed.
pub fn maybe_fixed_node_count(&self) -> Option<u32> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe guard against count being 0 for WHATEVER reason. In that case I'd prefer to emit None instead of Some(0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

3 participants