Skip to content

Move to CUDA 13.2#6249

Merged
JanuszL merged 4 commits into
NVIDIA:mainfrom
JanuszL:cuda_13.2
Mar 11, 2026
Merged

Move to CUDA 13.2#6249
JanuszL merged 4 commits into
NVIDIA:mainfrom
JanuszL:cuda_13.2

Conversation

@JanuszL
Copy link
Copy Markdown
Contributor

@JanuszL JanuszL commented Mar 9, 2026

  • moves to CUDA 13.2

Category:

Other (e.g. Documentation, Tests, Configuration)

Description:

  • moves to CUDA 13.2

Additional information:

Affected modules and functionalities:

  • toolchain docker files
  • build,sh

Key points relevant for the review:

  • NA

Tests:

  • Existing tests apply
    • build for CUDA 13.2
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45722131]: BUILD STARTED

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 9, 2026

Greptile Summary

This PR upgrades DALI's default CUDA toolchain to CUDA 13.2, adding new Dockerfile.cuda132.aarch64.deps and Dockerfile.cuda132.x86_64.deps build images, updating build.sh to default to and validate 13.2, marking 13.0 and 13.1 as deprecated in the documentation, and patching Dockerfile.deps with a glibc 2.42 compatibility fix for __clang_cuda_runtime_wrapper.h. As a bonus cleanup, the previously reviewed double-shift bug in the fatbinary wrapper script is also backported to the CUDA 13.0 and 13.1 Dockerfiles.

Key changes:

  • Two new Dockerfile.cuda132.{aarch64,x86_64}.deps files install CUDA 13.2 packages and the correct single-shift fatbinary wrapper
  • Dockerfile.deps adds a sed -z multiline patch to inject _NV_RSQRT_SPECIFIER guards for glibc ≥ 2.42 compatibility
  • build.sh default bumped from 13.1 → 13.2; validation list extended; header comment still reads default 13.0 instead of default 13.2 (minor inconsistency)
  • docs/compilation.rst correctly updated: 12.9 and 13.2 are officially supported; 13.0 and 13.1 are deprecated

Confidence Score: 5/5

  • PR is safe to merge; all changes are additive build-toolchain updates with one minor documentation inconsistency.
  • The new Dockerfiles are structurally consistent with the existing 13.0/13.1 variants and include the fixed fatbinary wrapper. The glibc 2.42 patch in Dockerfile.deps is a targeted, well-scoped sed substitution. The only issue is the stale "default 13.0" string in the build.sh usage comment, which does not affect runtime behaviour.
  • docker/build.sh — header comment says "default 13.0" but actual default is now 13.2

Important Files Changed

Filename Overview
docker/Dockerfile.cuda132.aarch64.deps New CUDA 13.2 aarch64 deps Dockerfile; correctly structured with pinned versions and the fixed fatbinary wrapper (no double-shift).
docker/Dockerfile.cuda132.x86_64.deps New CUDA 13.2 x86_64 deps Dockerfile; mirrors the aarch64 variant with correct platform paths and the fixed fatbinary wrapper.
docker/Dockerfile.deps Adds a second sed pass over __clang_cuda_runtime_wrapper.h to inject _NV_RSQRT_SPECIFIER guards for glibc ≥ 2.42 compatibility with CUDA 13.2.
docker/build.sh Default CUDA version bumped to 13.2, validation list updated; header comment still reads "default 13.0" instead of "default 13.2".
docker/Dockerfile.cuda130.aarch64.deps Backport fix: removed erroneous inner shift in --image=* case of the fatbinary wrapper to prevent arguments being silently dropped.
docker/Dockerfile.cuda130.x86_64.deps Same double-shift fix applied to x86_64 CUDA 13.0 deps Dockerfile.
docker/Dockerfile.cuda131.aarch64.deps Same double-shift fix applied to aarch64 CUDA 13.1 deps Dockerfile.
docker/Dockerfile.cuda131.x86_64.deps Same double-shift fix applied to x86_64 CUDA 13.1 deps Dockerfile.
docs/compilation.rst Documentation updated: 13.2 is now the officially supported default; 13.0 and 13.1 moved to deprecated list.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["build.sh\n(CUDA_VERSION default: 13.2)"] --> B{Validate CUDA_VERSION}
    B -->|"12.0–12.6, 12.8, 12.9\n13.0, 13.1, 13.2"| C[Select Dockerfile]
    B -->|"other"| D["Exit with error"]
    C --> E["Dockerfile.cuda130.*.deps\n(fatbinary double-shift fixed)"]
    C --> F["Dockerfile.cuda131.*.deps\n(fatbinary double-shift fixed)"]
    C --> G["Dockerfile.cuda132.*.deps\n(new — CUDA 13.2)"]
    G --> H["Install CUDA 13.2 packages\ncuda-minimal-build-13-2=13.2.0-1"]
    H --> I["Install nvjpeg2k 0.9.0.43\n+ nvcomp 5.1.0.21_cuda13"]
    I --> J["Patch fatbinary wrapper\n(single shift, --image=* translation)"]
    K["Dockerfile.deps\n(manylinux base)"] --> L["Patch __clang_cuda_runtime_wrapper.h\nRemove texture_fetch_functions.h"]
    L --> M["Inject _NV_RSQRT_SPECIFIER\nguard for glibc ≥ 2.42"]
Loading

Comments Outside Diff (1)

  1. docker/build.sh, line 8 (link)

    Stale "default 13.0" in header comment

    The usage comment on line 8 still says default 13.0, but the actual default (line 43) was bumped to 13.2 in this PR. A user reading the help text will get incorrect information about the default version.

Last reviewed commit: 140687a

Comment thread docker/Dockerfile.cuda132.aarch64.deps Outdated
case \"\$1\" in\\n\
--image=*)\\n\
specs=\$1\\n\
shift\\n\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-shift drops arguments after --image=*

The --image=* case body contains its own shift (to advance past $1 after saving it), and the outer shift at the bottom of the while loop unconditionally executes afterward. This means every time an --image=* argument is processed, the loop shifts twice — silently discarding the argument that immediately follows.

In practice, fatbinary is typically invoked with multiple --image=* flags (one per target GPU architecture, e.g. --image=profile=sm_89,... --image=profile=sm_90,...). With this bug, all --image=* arguments after the first are dropped, which will cause the bundled fatbin to contain only one PTX/SASS image instead of the full set.

The inner shift inside the --image=* case is unnecessary — remove it so only the outer loop shift advances the argument pointer:

Suggested change
shift\\n\
shift

→ (remove this line)

The same issue exists in docker/Dockerfile.cuda132.x86_64.deps at the equivalent lines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part hasn't been changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread docs/compilation.rst Outdated
* | CUDA_VERSION - CUDA toolkit version (12.9 and 13.1 are officially supported,
12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8 are 13.0 are deprecated
* | CUDA_VERSION - CUDA toolkit version (12.9 and 13.2 are officially supported,
12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 are 13.1 are deprecated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error: "are" should be "and"

"13.0 are 13.1 are deprecated" reads incorrectly. It should be "and 13.1".

Suggested change
12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 are 13.1 are deprecated
12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 and 13.1 are deprecated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

JanuszL added 2 commits March 9, 2026 18:19
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45727934]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45727934]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45722131]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45722131]: BUILD PASSED

Comment thread docker/build.sh Outdated
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45797831]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45797831]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [45797831]: BUILD PASSED

@JanuszL JanuszL merged commit 7c49abb into NVIDIA:main Mar 11, 2026
6 checks passed
@JanuszL JanuszL deleted the cuda_13.2 branch March 11, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants