Skip to content

Move driver and nvrtc cython and internal layers to new generator#1972

Open
mdboom wants to merge 14 commits into
NVIDIA:mainfrom
mdboom:driver-v2
Open

Move driver and nvrtc cython and internal layers to new generator#1972
mdboom wants to merge 14 commits into
NVIDIA:mainfrom
mdboom:driver-v2

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented Apr 24, 2026

This is a continuation of the work in #1900. Now adds driver to the mix and both nvrtc and driver are generated from the "real" new generator.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.bindings Everything related to the cuda.bindings module label Apr 24, 2026
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

1 similar comment
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@github-actions github-actions Bot added CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels Apr 24, 2026
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 24, 2026

/ok to test

@github-actions
Copy link
Copy Markdown

@leofang leofang self-requested a review April 24, 2026 23:59
@leofang leofang added this to the cuda.bindings 13.3.0 & 12.9.7 milestone Apr 24, 2026
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 25, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 25, 2026

/ok to test

1 similar comment
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 25, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 29, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 29, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 29, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 29, 2026

/ok to test

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 29, 2026

/ok to test

@mdboom mdboom marked this pull request as ready for review April 29, 2026 22:31
Copy link
Copy Markdown
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First wave of questions

from libc.stdint cimport uintptr_t
from cpython cimport PyUnicode_AsWideCharString, PyMem_Free

# You must 'from .utils import NotSupportedError' before using this template
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: check if this is from cybind template

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not from the cybind template, but it's added as part of the windows_externs.pxd snippet.

cdef int err, driver_ver = 0

# Load driver to check version
handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why don't we use pathfinder here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from linux_externs.pxd snippet in cybind and is pasted into every _internal/X_linux.pyx file that the generator generates. We should probably update this to use pathfinder, but it would have implications beyond cuda_bindings, so I didn't want to touch it.

I have an open PR in cybind to move away from snippets, because I think we are seeing one of the downsides of that approach here.

Importantly, this is dead code in this file. get_cuda_version is never called elsewhere -- it's a utility for libraries that need to get the cuda version before loading their own library. (Again, that should probably get moved to pathfinder as well, but is orthogonal to this change).

raise RuntimeError("Failed to get __cuGetProcAddress_v2")
_F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2

if os.getenv('CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM', default=0):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this evaluate to True for export CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=0?

>>> if '0': print(123)
...
123

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good catch, and we should probably fix it. Note this was copied directly from the existing code which also has this bug:

https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in#L522

Comment on lines +591 to +596
# Get latest __cuGetProcAddress_v2
global __cuGetProcAddress_v2
__cuGetProcAddress_v2 = dlsym(handle, 'cuGetProcAddress_v2')
if __cuGetProcAddress_v2 == NULL:
raise RuntimeError("Failed to get __cuGetProcAddress_v2")
_F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the old code has a path where we do dlsym to get unversioned symbols, is it no longer relevant?

Copy link
Copy Markdown
Contributor Author

@mdboom mdboom May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the old code had a path to fall back to dlsym if we can't load cuGetProcAddress_v2. This now assumes it's available and works. IIUC, the old code path was for old CTK's where that might not be available and we don't need it anymore. But I think the best way would be to have QA run this across a range of things. It would be great to drop that complexity if we could. The complexity is not just another branch of this that uses dlsym -- it's from re-creating the versioned symbol mapping that cuGetProcAddress_v2 handles for us. It's a big part of what the old generator did that doesn't yet exist in the new generator.

cdef int err, driver_ver = 0

# Load driver to check version
handle = LoadLibraryExW("nvcuda.dll", NULL, LOAD_LIBRARY_SEARCH_SYSTEM32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, re: pathfinder

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason not to update this now as for Linux.

@leofang leofang added P0 High priority - Must do! enhancement Any code-related improvements and removed CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels May 1, 2026
@mdboom mdboom requested a review from leofang May 13, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants