diff --git a/docs/administration/index.md b/docs/administration/index.md index 3196cadb..50b5edb8 100644 --- a/docs/administration/index.md +++ b/docs/administration/index.md @@ -32,6 +32,13 @@ The admin database name is [configurable](../configuration/pgdog.toml/admin.md). | `SHOW QUERY_CACHE` | List statements currently in the AST cache used for query routing. | | [`MAINTENANCE`](maintenance_mode.md) | Pause all queries to synchronize configuration changes across multiple instances of PgDog. | | [`SHOW REPLICATION`](replication.md) | Show the status of PostgreSQL replication for each database, including replica lag. | +| [`RESHARD`](../features/sharding/resharding/index.md) | Reshard a database online and cut over traffic automatically. Returns a background `task_id`. | +| [`COPY_DATA`](../features/sharding/resharding/move.md) | Copy and reshard data to a destination cluster, then stream changes. Returns a background `task_id`. | +| [`SCHEMA_SYNC`](../features/sharding/resharding/schema.md) | Synchronize schema for a given phase between source and destination. Returns a background `task_id`. | +| `REPLICATE` | Stream changes from source to destination (schema and data must already be synced). Returns a background `task_id`. | +| [`CUTOVER`](../features/sharding/resharding/cutover.md) | Cut traffic over to the destination cluster for a running replication task. | +| [`SHOW TASKS`](tasks.md) | List background tasks (resharding, copy, replication, schema sync) with their status and elapsed time. | +| [`STOP_TASK`](tasks.md#stopping-a-task) | Stop a running background task by its `task_id`. | ## Shutting down PgDog diff --git a/docs/administration/tasks.md b/docs/administration/tasks.md new file mode 100644 index 00000000..cb0f82b6 --- /dev/null +++ b/docs/administration/tasks.md @@ -0,0 +1,108 @@ +--- +icon: material/format-list-checks +--- + +# Background tasks + +Long-running operations like [resharding](../features/sharding/resharding/index.md) don't block the connection that started them. Instead, `RESHARD`, `COPY_DATA`, `REPLICATE`, and `SCHEMA_SYNC` each return a `task_id` and run in the background. `SHOW TASKS` is how you track those tasks: it reports their lifecycle status and progress, lets you find the `task_id` to pass to [`STOP_TASK`](#stopping-a-task) or [`CUTOVER`](../features/sharding/resharding/cutover.md), and surfaces failures. + +Run it on the [admin database](index.md): + +``` +SHOW TASKS; +``` + +!!! note "Admin database tasks only" + `SHOW TASKS` lists the background tasks running *inside this PgDog process* — the ones started by the admin database commands above. The equivalent [CLI commands](../features/sharding/resharding/index.md#running-the-steps-manually) (`pgdog data-sync`, `pgdog schema-sync`) run as a separate, one-off `pgdog` process in the foreground: they block until they finish and are stopped with `Ctrl-C`, so they neither return a `task_id` nor appear here. + +=== "Output" + ``` + -[ RECORD 1 ]+---------------------------------- + id | 14 + scope | root + type | reshard prod -> prod_sharded + status | running + inner_status | syncing data + started_at | 2026-06-30 18:02:11.004 UTC + updated_at | 2026-06-30 18:05:42.119 UTC + elapsed | 00:03:31:115 + elapsed_ms | 211115 + -[ RECORD 2 ]+---------------------------------- + id | + scope | subtask + type | copy_data prod -> prod_sharded + status | running + inner_status | + started_at | 2026-06-30 18:02:30.550 UTC + updated_at | 2026-06-30 18:05:42.119 UTC + elapsed | 00:03:11:569 + elapsed_ms | 191569 + ``` + +The most recently started task is listed first. + +## Columns + +| Column | Description | +|-|-| +| `id` | The task's id. This is the handle you pass to [`STOP_TASK`](#stopping-a-task) and [`CUTOVER`](../features/sharding/resharding/cutover.md). Only **root** rows carry an id; subtasks share their root's id and leave this column empty. | +| `scope` | `root` for a top-level task, `subtask` for a step it spawned (e.g. the `copy_data` and `replication` steps of a `reshard`). | +| `type` | What the task is, usually with the source and destination databases — e.g. `reshard`, `copy_data`, `replication`, `replication ... (reverse)`, `schema_sync(pre)`. | +| `status` | The lifecycle status of the task. See [Statuses](#statuses). | +| `inner_status` | Fine-grained progress within the current status. See [Progress](#progress). For a finished or failed task, this keeps the last progress it reported. | +| `started_at` | When the task started. | +| `updated_at` | When the task last changed status or progress. For a terminal task this is when it finished. | +| `elapsed` / `elapsed_ms` | How long the task ran. For a terminal task this is the total run time (measured to `updated_at`), not a clock that keeps ticking. | + +## Statuses + +The `status` column describes where the task is in its lifecycle: + +| Status | Meaning | +|-|-| +| `started` | The task is being set up. | +| `running` | The task is actively working. | +| `cancelling` | A [`STOP_TASK`](#stopping-a-task) was requested and the task is winding down cooperatively. | +| `finished` | The task completed successfully. | +| `cancelled` | The task was stopped before completing. | +| `failed: ` | The task errored. The error message is included in the status. | +| `panicked: ` | The task hit an unexpected internal error. | + +`finished`, `cancelled`, `failed`, and `panicked` are **terminal**. Terminal tasks stay listed with their final status (and last progress) for a while so you can inspect the outcome, then they're pruned automatically. + +## Progress + +The `inner_status` column shows what the task is doing within its current `status`. The values depend on the task `type`: + +| Task | Progress values | +|-|-| +| `reshard` | `syncing schema` → `syncing data` → `finalizing schema` → `replicating` | +| `copy_data` | (no sub-status; track copy progress with [`SHOW TABLE_COPIES`](../features/sharding/resharding/move.md#monitoring-progress)) | +| `replication` | `replicating` → `cutting over` (or `rolling back` for a reverse stream) / `stopping` | +| `schema_sync` | `loading schema` → `syncing tables` → `creating indexes` (or `syncing cutover schema`) | + +## Finding the right id + +`STOP_TASK` and `CUTOVER` always operate on the **root** `id`: + +- A `RESHARD` or `COPY_DATA` shows as a `root` task with its child steps listed as `subtask` rows. Target the root id, not a subtask — subtask rows have an empty `id` on purpose. +- After a cutover, the [reverse replication](../features/sharding/resharding/cutover.md#after-the-cutover) stream appears as its own `root` task of type `replication ... (reverse)`. Use its id to [roll back](../features/sharding/resharding/cutover.md#rolling-back) (`CUTOVER `) or to [finalize the migration](../features/sharding/resharding/cutover.md#finalizing-the-migration) (`STOP_TASK `). + +## Spotting issues + +| Symptom | What to check | +|-|-| +| `status` is `failed: ...` or `panicked: ...` | Read the error message in the `status` column, then check the PgDog logs for the full context. The task is stopped; address the cause and re-run the command. | +| Task stuck at one `inner_status` for a long time | Cross-reference [`SHOW TABLE_COPIES`](../features/sharding/resharding/move.md#monitoring-progress) (during `syncing data`) and [`SHOW REPLICATION_SLOTS`](../features/sharding/resharding/move.md#streaming-updates) (during `replicating`) to see whether data is still flowing. | +| `status` is `cancelling` and not clearing | The task is draining its replication streams. A cutover that has already started runs to completion before the status settles. | +| Terminal task you expected to still be running | It reached a terminal state (`finished`/`cancelled`/`failed`). Check `inner_status` and `updated_at` to see where and when it stopped. | + +## Stopping a task + +Stop any running task by its root `id`: + +``` +STOP_TASK ; +``` + +This requests cancellation; the task winds down gracefully and stops appearing as `running` once it has actually stopped. For replication and copy tasks, a graceful stop also drops the replication slot the task created. See [finalizing the migration](../features/sharding/resharding/cutover.md#finalizing-the-migration) for how this is used after a cutover. diff --git a/docs/features/sharding/resharding/cutover.md b/docs/features/sharding/resharding/cutover.md index 8988273d..e23663d2 100644 --- a/docs/features/sharding/resharding/cutover.md +++ b/docs/features/sharding/resharding/cutover.md @@ -12,10 +12,16 @@ Traffic cutover involves moving application traffic (read and write queries) to ## Performing the cutover -The cutover can be executed by executing a command on the [admin database](../../../administration/index.md): +The cutover can be executed by running a command on the [admin database](../../../administration/index.md): ``` -CUTOVER; +CUTOVER []; +``` + +Without a `task_id`, PgDog cuts over the first running replication task. To target a specific task, pass its id (as reported by [`SHOW TASKS`](../../../administration/tasks.md)): + +``` +CUTOVER 12; ``` Under typical conditions, the whole process takes less than a second, so applications shouldn't experience any errors or downtime. @@ -25,8 +31,12 @@ Under typical conditions, the whole process takes less than a second, so applica must connect to the database through PgDog. Any applications that connect to the database directly, or through another proxy, will not receive the cutover signal and will continue to send writes to the source database, causing a split-brain situation. - -If you're using the `RESHARD` command, the cutover step is executed automatically and you don't need to perform any additional steps. + +`CUTOVER` is only needed when you run resharding manually (via [`COPY_DATA`](move.md) / [`REPLICATE`](../../../administration/index.md)). If you use the `RESHARD` command, the cutover step is executed automatically and you don't need to perform any additional steps. + +`CUTOVER` only works on a migration that was started on **this** PgDog through the admin database (`COPY_DATA`, `REPLICATE`, or `RESHARD`). A migration started with the `pgdog data-sync` CLI runs in a separate process and **cannot** be cut over with `CUTOVER` command automatically. + +In order to cut over a CLI migration by hand: once replication has caught up (check the log output of running the CLI), swap the source and destination in `pgdog.toml` and `users.toml`, run [`RELOAD`](../../../administration/index.md) on the serving instance, then stop the CLI. Pause writes ([`MAINTENANCE`](../../../administration/maintenance_mode.md)) while you do this to avoid losing in-flight transactions; there's no automatic rollback with this path. Then stop the CLI replication process and resume writes. ## Step by step @@ -88,7 +98,7 @@ When enabled, PgDog will backup both configuration files, `pgdog.toml` as `pgdog !!! note "Multi-node deployments" If you're running more than one PgDog node, you should consider deploying our [Enterprise Edition](../../../enterprise_edition/index.md), which has support for saving the configuration files on multiple PgDog nodes at the same time. - + #### Thresholds Before swapping the configuration, PgDog waits for the two databases to be completely identical. These thresholds are configurable as follows: @@ -143,3 +153,33 @@ The reverse replication is created while the queries to both databases are pause ### Resume queries With the reverse replication set up, it is now safe to move traffic to the destination (now source) database. PgDog does this by turning off [maintenance mode](../../../administration/maintenance_mode.md), and this step concludes the cutover. The entire process takes less than a second, typically, and allows PgDog to reshard Postgres databases without downtime. + +## After the cutover + +Once the cutover completes, the reverse [replication stream](#reverse-replication) keeps the original cluster up to date with every write that now lands on the new cluster. It runs as a background task — find it in [`SHOW TASKS`](../../../administration/tasks.md) (its `type` is `replication`). While it is running, you can either roll back to the original cluster or, once you're satisfied, finalize the migration. + +### Rolling back + +To restore traffic to the original cluster, run a `CUTOVER` against the reverse replication task, passing its id from `SHOW TASKS`: + +``` +CUTOVER ; +``` + +This performs the same atomic swap in reverse: the source and destination are swapped back in `pgdog.toml` and `users.toml`, and traffic returns to the original cluster. Because the reverse stream kept the original cluster in sync, no data written to the new cluster is lost. The task's status in `SHOW TASKS` shows `rolling back` while this happens. + +!!! note "Restoring the configuration files" + The rollback swaps the configuration back in memory (and on disk when [`cutover_save_config`](#swap-the-configuration) is enabled). If you enabled `cutover_save_config`, the configuration as it was *before* the original cutover is also preserved in the `pgdog.bak.toml` and `users.bak.toml` backups, so you can always restore the previous state by hand. + +### Finalizing the migration + +When you're confident the new cluster is healthy and no longer need the ability to roll back, stop the reverse replication task: + +``` +STOP_TASK ; +``` + +This winds the reverse stream down gracefully and **drops the replication slot** it created, so Postgres can resume recycling WAL. After this point rollback is no longer possible — the migration is complete. + +!!! warning "Don't leave reverse replication running indefinitely" + Until the reverse replication task is stopped, its permanent replication slot prevents the new (now source) cluster from recycling WAL, which accumulates on disk. Issue `STOP_TASK` once you've decided not to roll back. diff --git a/docs/features/sharding/resharding/index.md b/docs/features/sharding/resharding/index.md index 123133b6..fbd46a45 100644 --- a/docs/features/sharding/resharding/index.md +++ b/docs/features/sharding/resharding/index.md @@ -35,6 +35,25 @@ RESHARD ; The `` and `` parameters accept the name of the source and destination databases respectively. The `` parameter expects the name of the Postgres [publication](schema.md#publication) for the tables that need to be resharded. +`RESHARD` returns a `task_id` and runs in the background. You can track its progress with [`SHOW TASKS`](../../../administration/tasks.md), and stop it with `STOP_TASK `. When `RESHARD` is used, traffic cutover happens automatically as the final step. + +### Running the steps manually + +Instead of `RESHARD`, you can run the process one step at a time. This gives you control over *when* traffic is cut over. Each step can be run either as an [admin database](../../../administration/index.md) command or as a `pgdog` CLI command: + +| Step | Admin database | CLI | +|-|-|-| +| [Schema sync](schema.md) | `SCHEMA_SYNC ` | `pgdog schema-sync ...` | +| [Move & reshard data](move.md) | `COPY_DATA ` | `pgdog data-sync ...` | +| [Cutover traffic](cutover.md) | `CUTOVER []` | _admin database only_ | + +The two run differently: + +- **Admin database commands** run as background tasks *inside the running PgDog*. They return a `task_id` immediately, are tracked with [`SHOW TASKS`](../../../administration/tasks.md), and are controlled with [`STOP_TASK`](../../../administration/tasks.md#stopping-a-task) and [`CUTOVER`](cutover.md). +- **CLI commands** run as a separate, one-off `pgdog` process in the **foreground**: they block until the operation finishes and are stopped with `Ctrl-C`. They do not appear in `SHOW TASKS`. + +Unlike `RESHARD`, the manual path does **not** cut over automatically: the data move copies the data and keeps streaming changes indefinitely, and you switch traffic explicitly with [`CUTOVER`](cutover.md). + !!! note "Traffic cutover" Traffic cutover requires careful synchronization to avoid data loss and a split-brain situation. The `RESHARD` command supports this for **single node** PgDog deployments only. The [Enterprise Edition](../../../enterprise_edition/index.md) provides a control plane, which supports traffic cutover with multiple PgDog containers. diff --git a/docs/features/sharding/resharding/move.md b/docs/features/sharding/resharding/move.md index 4bc4e9d2..37029b32 100644 --- a/docs/features/sharding/resharding/move.md +++ b/docs/features/sharding/resharding/move.md @@ -36,11 +36,18 @@ This will spawn a background task that will copy all tables in the [publication] To copy data from database `"prod"` to database `"prod_sharded"` and the `"all_tables"` publication, execute the following command: -``` -COPY_DATA prod prod_sharded all_tables; -``` +=== "Admin command" + ``` + COPY_DATA prod prod_sharded all_tables; + ``` +=== "Output" + ``` + task_id | replication_slot + ---------+----------------------------- + 12 | __pgdog_repl_a1b2c3d4e5f6... + ``` -The name of the replication slot will be automatically generated. +`COPY_DATA` returns immediately with the background `task_id` and the auto-generated replication slot name. Track the task with [`SHOW TASKS`](../../../administration/tasks.md) and stop it with `STOP_TASK `. The slot name is not provided, so it is generated for you. ### CLI @@ -63,6 +70,9 @@ Required (*) and optional parameters for this command are as follows: | `--replication-slot` | Name of the replication slot to use (and create if it doesn't exist) for syncing real-time changes. | | `--replicate-only` | Don't copy data, just stream changes from the replication slot. | | `--sync-only` | Perform the initial data sync only and exit. | +| `--skip-schema-sync` | Don't run the pre-data schema sync first (assume the destination schema already exists). | + +Unlike the `COPY_DATA` admin command, the CLI runs in the **foreground** as a separate `pgdog` process: it blocks while it copies the data and then keeps streaming changes until you stop it with `Ctrl-C` (which winds the task down gracefully), or until you pass `--sync-only` to copy and exit. It does not return a `task_id` and is not visible in [`SHOW TASKS`](../../../administration/tasks.md). ## How it works @@ -238,7 +248,26 @@ SELECT pg_current_wal_lsn() FROM pg_replication_slots; ``` -The replication delay between the two database clusters is measured in bytes. When that number reaches zero, the two databases are byte-for-byte identical, and traffic can be [cut over](cutover.md) to the destination database. +The replication delay between the two database clusters is measured in bytes. When that number reaches zero, the two databases are byte-for-byte identical. + +`COPY_DATA` keeps streaming changes indefinitely, keeping the destination in sync with the source. To monitor or stop the background task, see the task-level view below. + +### Monitoring the task + +When started from the admin database, `COPY_DATA` runs as a background task. List all tasks, their current status, and elapsed time with: + +=== "Admin command" + ``` + SHOW TASKS; + ``` +=== "Output" + ``` + id | scope | type | status | inner_status | elapsed + ----+-------+------------------------------+---------+--------------+---------- + 12 | root | reshard prod -> prod_sharded | running | replicating | 00:02:14 + ``` + +`COPY_DATA` runs the same pipeline as `RESHARD` but leaves the traffic in place, so it appears with a `reshard` type and progresses through `syncing data` to `replicating`. Stop it with `STOP_TASK `. See [Background tasks](../../../administration/tasks.md) for the full column and status reference. ## Troubleshooting @@ -311,7 +340,7 @@ See [Integer primary keys](#integer-primary-keys) for the other common reason to ### Replication slot not cleaned up after stop or crash -If `COPY_DATA` is stopped via `STOP_TASK` or PgDog exits unexpectedly, the **permanent** replication slot will remain on the source. Postgres will not recycle WAL until it is dropped. Clean it up manually: +Stopping a task gracefully with `STOP_TASK` winds the replication stream down and drops its **permanent** replication slot for you. However, if PgDog exits unexpectedly (crash, `SIGKILL`, lost connection) the slot can remain on the source. Postgres will not recycle WAL until it is dropped, so clean it up manually: ```postgresql SELECT pg_drop_replication_slot('slot_name');