Move driver and nvrtc cython and internal layers to new generator by mdboom · Pull Request #1972 · NVIDIA/cuda-python · GitHub

mdboom · 2026-04-24T13:46:37Z

This is a continuation of the work in #1900. Now adds driver to the mix and both nvrtc and driver are generated from the "real" new generator.

copy-pr-bot · 2026-04-24T13:46:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-04-24T13:47:21Z

/ok to test

mdboom · 2026-04-24T14:59:54Z

/ok to test

mdboom · 2026-04-24T17:37:15Z

/ok to test

mdboom · 2026-04-24T17:52:22Z

/ok to test

mdboom · 2026-04-24T19:05:25Z

/ok to test

mdboom · 2026-04-24T20:48:19Z

/ok to test

mdboom · 2026-04-24T21:30:36Z

/ok to test

github-actions · 2026-04-24T21:48:09Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1972/
https://nvidia.github.io/cuda-python/pr-preview/pr-1972/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1972/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1972/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

mdboom · 2026-04-25T14:21:56Z

/ok to test

mdboom · 2026-04-25T15:17:05Z

/ok to test

mdboom · 2026-04-25T15:17:34Z

/ok to test

mdboom · 2026-04-29T13:38:07Z

/ok to test

mdboom · 2026-04-29T13:39:27Z

/ok to test

mdboom · 2026-04-29T14:16:21Z

/ok to test

mdboom · 2026-04-29T14:45:13Z

/ok to test

mdboom · 2026-04-29T18:37:36Z

/ok to test

leofang

First wave of questions

leofang · 2026-04-30T21:58:04Z

+from libc.stdint cimport uintptr_t
+from cpython cimport PyUnicode_AsWideCharString, PyMem_Free
+
+# You must 'from .utils import NotSupportedError' before using this template


Note to self: check if this is from cybind template

It's not from the cybind template, but it's added as part of the windows_externs.pxd snippet.

leofang · 2026-04-30T21:59:11Z

+    cdef int err, driver_ver = 0
+
+    # Load driver to check version
+    handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL)


Q: why don't we use pathfinder here?

This comes from linux_externs.pxd snippet in cybind and is pasted into every _internal/X_linux.pyx file that the generator generates. We should probably update this to use pathfinder, but it would have implications beyond cuda_bindings, so I didn't want to touch it.

I have an open PR in cybind to move away from snippets, because I think we are seeing one of the downsides of that approach here.

Importantly, this is dead code in this file. get_cuda_version is never called elsewhere -- it's a utility for libraries that need to get the cuda version before loading their own library. (Again, that should probably get moved to pathfinder as well, but is orthogonal to this change).

leofang · 2026-04-30T22:02:54Z

+            raise RuntimeError("Failed to get __cuGetProcAddress_v2")
+        _F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2
+
+        if os.getenv('CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM', default=0):


Wouldn't this evaluate to True for export CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=0?

>>> if '0': print(123) ... 123

That's good catch, and we should probably fix it. Note this was copied directly from the existing code which also has this bug:

https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in#L522

leofang · 2026-04-30T22:05:06Z

+        # Get latest __cuGetProcAddress_v2
+        global __cuGetProcAddress_v2
+        __cuGetProcAddress_v2 = dlsym(handle, 'cuGetProcAddress_v2')
+        if __cuGetProcAddress_v2 == NULL:
+            raise RuntimeError("Failed to get __cuGetProcAddress_v2")
+        _F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2


IIRC the old code has a path where we do dlsym to get unversioned symbols, is it no longer relevant?

Yes, the old code had a path to fall back to dlsym if we can't load cuGetProcAddress_v2. This now assumes it's available and works. IIUC, the old code path was for old CTK's where that might not be available and we don't need it anymore. But I think the best way would be to have QA run this across a range of things. It would be great to drop that complexity if we could. The complexity is not just another branch of this that uses dlsym -- it's from re-creating the versioned symbol mapping that cuGetProcAddress_v2 handles for us. It's a big part of what the old generator did that doesn't yet exist in the new generator.

leofang · 2026-04-30T22:05:34Z

+    cdef int err, driver_ver = 0
+
+    # Load driver to check version
+    handle = LoadLibraryExW("nvcuda.dll", NULL, LOAD_LIBRARY_SEARCH_SYSTEM32)


ditto, re: pathfinder

Same reason not to update this now as for Linux.

Move driver and nvrtc cython and internal layers to new generator

f842321

github-actions Bot added the cuda.bindings Everything related to the cuda.bindings module label Apr 24, 2026

Fix Cython interop tests

992998a

Handle headers differently

6b1d2d3

Fix compilation

be5d972

github-actions Bot added CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels Apr 24, 2026

mdboom force-pushed the driver-v2 branch from cbf9660 to be5d972 Compare April 24, 2026 19:26

Make const match

2d7c8dd

Fix compilation again

f664c86

leofang assigned mdboom Apr 24, 2026

leofang self-requested a review April 24, 2026 23:59

leofang added this to the cuda.bindings 13.3.0 & 12.9.7 milestone Apr 24, 2026

Attempt to fix Windows build

00ea580

Fix Windows again

9982e06

Fix not found error code

6bf0565

Merge remote-tracking branch 'upstream/main' into driver-v2

103c29f

Look up pointers in a different way

591d101

mdboom added 2 commits April 29, 2026 10:43

Get first version right

7119b96

Get first appearing versions correct

0d7171c

Fix weird special case

9bb9f5a

mdboom marked this pull request as ready for review April 29, 2026 22:31

leofang reviewed Apr 30, 2026

View reviewed changes

leofang added P0 High priority - Must do! enhancement Any code-related improvements and removed CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels May 1, 2026

mdboom requested a review from leofang May 13, 2026 13:38

Conversation

mdboom commented Apr 24, 2026

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

mdboom commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mdboom commented Apr 25, 2026

Uh oh!

mdboom commented Apr 25, 2026

Uh oh!

mdboom commented Apr 25, 2026

Uh oh!

mdboom commented Apr 29, 2026

Uh oh!

mdboom commented Apr 29, 2026

Uh oh!

mdboom commented Apr 29, 2026

Uh oh!

mdboom commented Apr 29, 2026

Uh oh!

mdboom commented Apr 29, 2026

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mdboom May 13, 2026 •

edited

Loading