Move to CUDA 13.2#6249
Conversation
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
CI MESSAGE: [45722131]: BUILD STARTED |
Greptile SummaryThis PR upgrades DALI's default CUDA toolchain to CUDA 13.2, adding new Key changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["build.sh\n(CUDA_VERSION default: 13.2)"] --> B{Validate CUDA_VERSION}
B -->|"12.0–12.6, 12.8, 12.9\n13.0, 13.1, 13.2"| C[Select Dockerfile]
B -->|"other"| D["Exit with error"]
C --> E["Dockerfile.cuda130.*.deps\n(fatbinary double-shift fixed)"]
C --> F["Dockerfile.cuda131.*.deps\n(fatbinary double-shift fixed)"]
C --> G["Dockerfile.cuda132.*.deps\n(new — CUDA 13.2)"]
G --> H["Install CUDA 13.2 packages\ncuda-minimal-build-13-2=13.2.0-1"]
H --> I["Install nvjpeg2k 0.9.0.43\n+ nvcomp 5.1.0.21_cuda13"]
I --> J["Patch fatbinary wrapper\n(single shift, --image=* translation)"]
K["Dockerfile.deps\n(manylinux base)"] --> L["Patch __clang_cuda_runtime_wrapper.h\nRemove texture_fetch_functions.h"]
L --> M["Inject _NV_RSQRT_SPECIFIER\nguard for glibc ≥ 2.42"]
|
| case \"\$1\" in\\n\ | ||
| --image=*)\\n\ | ||
| specs=\$1\\n\ | ||
| shift\\n\ |
There was a problem hiding this comment.
Double-shift drops arguments after --image=*
The --image=* case body contains its own shift (to advance past $1 after saving it), and the outer shift at the bottom of the while loop unconditionally executes afterward. This means every time an --image=* argument is processed, the loop shifts twice — silently discarding the argument that immediately follows.
In practice, fatbinary is typically invoked with multiple --image=* flags (one per target GPU architecture, e.g. --image=profile=sm_89,... --image=profile=sm_90,...). With this bug, all --image=* arguments after the first are dropped, which will cause the bundled fatbin to contain only one PTX/SASS image instead of the full set.
The inner shift inside the --image=* case is unnecessary — remove it so only the outer loop shift advances the argument pointer:
| shift\\n\ | |
| shift |
→ (remove this line)
The same issue exists in docker/Dockerfile.cuda132.x86_64.deps at the equivalent lines.
There was a problem hiding this comment.
This part hasn't been changed.
| * | CUDA_VERSION - CUDA toolkit version (12.9 and 13.1 are officially supported, | ||
| 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8 are 13.0 are deprecated | ||
| * | CUDA_VERSION - CUDA toolkit version (12.9 and 13.2 are officially supported, | ||
| 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 are 13.1 are deprecated |
There was a problem hiding this comment.
Grammatical error: "are" should be "and"
"13.0 are 13.1 are deprecated" reads incorrectly. It should be "and 13.1".
| 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 are 13.1 are deprecated | |
| 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 13.0 and 13.1 are deprecated |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
CI MESSAGE: [45727934]: BUILD STARTED |
|
CI MESSAGE: [45727934]: BUILD FAILED |
|
CI MESSAGE: [45722131]: BUILD FAILED |
|
CI MESSAGE: [45722131]: BUILD PASSED |
|
CI MESSAGE: [45797831]: BUILD STARTED |
|
CI MESSAGE: [45797831]: BUILD FAILED |
|
CI MESSAGE: [45797831]: BUILD PASSED |
Category:
Other (e.g. Documentation, Tests, Configuration)
Description:
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A