CI: tidy nightly test-matrix + bump torch to 2.12.1 by leofang · Pull Request #2272 · NVIDIA/cuda-python

leofang · 2026-06-27T03:58:06Z

Summary

Collapse per-row MODE / TORCH_VER / TORCH_CUDA of nightly entries into the ENV: map so they ride the existing matrix-env injection step in test-wheel-{linux,windows}.yml. Workflow selectors (ci-nightly.yml) and job-name strings updated accordingly.
Bump latest-PyTorch rows from 2.11.0 → 2.12.1; 2.9.1 rows unchanged.
Job names now also show the torch CUDA suffix, e.g. , 2.12.1+cu126.
Align nightly section columns with the pull-request rows for readability.
Add a nightly-standard arm64 gh200 row, but comment it out for now: the gh200 runner currently hangs on stream-ordered memory allocator (cudaMallocAsync) calls. The row is left in place (with a TODO) so it can be re-enabled once the runner-side issue is resolved.

Test plan

Verify nightly matrix expansion across modes (nightly-pytorch, nightly-numba-cuda, nightly-standard) via a workflow run.
PyTorch 2.12.1 wheels (cu126 / cu130) install cleanly.

- ci/test-matrix.yml: move per-row MODE/TORCH_VER/TORCH_CUDA into the ENV map (rides the existing matrix-env injection step). Add a nightly-standard arm64 gh200 row. Bump latest-PyTorch rows from 2.11.0 to 2.12.1; 2.9.1 rows untouched. - .github/workflows/ci-nightly.yml: matrix_filter selectors now key on .ENV.MODE. - .github/workflows/test-wheel-{linux,windows}.yml: job-name format strings read TORCH_VER/MODE from matrix.ENV; TORCH_CUDA also rendered in the name (e.g. ", 2.12.1+cu126"). Drop the now-redundant TORCH_VER/TORCH_CUDA lines from the pytorch step's env block.

copy-pr-bot · 2026-06-27T03:58:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Pad PY_VER and GPU columns in the nightly section to match the widths used by the pull-request rows above (17-char PY_VER, 19-char GPU). Purely cosmetic; YAML parse and matrix expansion unchanged.

Remove before merging.

leofang · 2026-06-27T04:19:49Z

/ok to test d29bc34

leofang · 2026-06-27T04:22:42Z

+    - { ARCH: 'arm64', PY_VER: '3.12',  CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest',     ENV: { MODE: 'nightly-numba-cuda' } }
+    - { ARCH: 'arm64', PY_VER: '3.12',  CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest',     ENV: { MODE: 'nightly-numba-cuda' } }
+    # nightly-standard (arm64 nightly-only runners — per runner team request)
+    - { ARCH: 'arm64', PY_VER: '3.14',  CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'gh200',      GPU_COUNT: '1', DRIVER: 'latest',     ENV: { MODE: 'nightly-standard' } }


Adding an experimental G+H pipeline here (cc @kkraus14 for vis).

github-actions · 2026-06-27T04:42:59Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-2272/
https://nvidia.github.io/cuda-python/pr-preview/pr-2272/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-2272/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-2272/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

leofang · 2026-06-27T04:54:42Z

Killed the hanging G+H pipeline:
https://github.com/NVIDIA/cuda-python/actions/runs/28278398318/job/83789726071?pr=2272

@bdice also saw the same issue in RMM: https://github.com/rapidsai/rmm/actions/runs/28270219058/job/83767744891?pr=2457.

Will revisit later...

The gh200 runner currently hangs on stream-ordered memory allocator calls (cudaMallocAsync). Disabling until the runner-side issue is resolved.

This reverts commit d29bc34.

leofang · 2026-06-28T03:08:29Z

/ok to test 7e002a6

github-actions Bot added the CI/CD CI/CD infrastructure label Jun 27, 2026

leofang added 2 commits June 27, 2026 04:18

ci/test-matrix.yml: align nightly columns with pull-request rows

97ee1cf

Pad PY_VER and GPU columns in the nightly section to match the widths used by the pull-request rows above (17-char PY_VER, 19-char GPU). Purely cosmetic; YAML parse and matrix expansion unchanged.

Temporarily add push trigger to ci-nightly.yml for testing

d29bc34

Remove before merging.

leofang commented Jun 27, 2026

View reviewed changes

leofang added 2 commits June 28, 2026 02:55

ci/test-matrix.yml: temporarily comment out gh200 nightly row

4c70cfa

The gh200 runner currently hangs on stream-ordered memory allocator calls (cudaMallocAsync). Disabling until the runner-side issue is resolved.

Revert "Temporarily add push trigger to ci-nightly.yml for testing"

7e002a6

This reverts commit d29bc34.

leofang changed the title ~~CI: tidy nightly test-matrix + add arm64 gh200 + bump torch 2.12.1~~ CI: tidy nightly test-matrix + bump torch to 2.12.1 Jun 28, 2026

leofang marked this pull request as ready for review June 28, 2026 03:02

leofang self-assigned this Jun 28, 2026

leofang added this to the cuda.core next milestone Jun 28, 2026

leofang added enhancement Any code-related improvements P1 Medium priority - Should do labels Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: tidy nightly test-matrix + bump torch to 2.12.1#2272

CI: tidy nightly test-matrix + bump torch to 2.12.1#2272
leofang wants to merge 5 commits into
NVIDIA:mainfrom
leofang:leofang/ci-test-matrix-env-refactor

leofang commented Jun 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

leofang commented Jun 27, 2026

Uh oh!

leofang Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang commented Jun 27, 2026 •

edited

Loading

Uh oh!

leofang commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

leofang commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

leofang commented Jun 27, 2026

Uh oh!

leofang Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 27, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leofang commented Jun 27, 2026 •

edited

Loading

leofang commented Jun 27, 2026 •

edited

Loading