Move driver and nvrtc cython and internal layers to new generator#1972
Move driver and nvrtc cython and internal layers to new generator#1972mdboom wants to merge 14 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
1 similar comment
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
|
/ok to test |
|
/ok to test |
1 similar comment
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
| from libc.stdint cimport uintptr_t | ||
| from cpython cimport PyUnicode_AsWideCharString, PyMem_Free | ||
|
|
||
| # You must 'from .utils import NotSupportedError' before using this template |
There was a problem hiding this comment.
Note to self: check if this is from cybind template
There was a problem hiding this comment.
It's not from the cybind template, but it's added as part of the windows_externs.pxd snippet.
| cdef int err, driver_ver = 0 | ||
|
|
||
| # Load driver to check version | ||
| handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL) |
There was a problem hiding this comment.
Q: why don't we use pathfinder here?
There was a problem hiding this comment.
This comes from linux_externs.pxd snippet in cybind and is pasted into every _internal/X_linux.pyx file that the generator generates. We should probably update this to use pathfinder, but it would have implications beyond cuda_bindings, so I didn't want to touch it.
I have an open PR in cybind to move away from snippets, because I think we are seeing one of the downsides of that approach here.
Importantly, this is dead code in this file. get_cuda_version is never called elsewhere -- it's a utility for libraries that need to get the cuda version before loading their own library. (Again, that should probably get moved to pathfinder as well, but is orthogonal to this change).
| raise RuntimeError("Failed to get __cuGetProcAddress_v2") | ||
| _F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2 | ||
|
|
||
| if os.getenv('CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM', default=0): |
There was a problem hiding this comment.
Wouldn't this evaluate to True for export CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=0?
>>> if '0': print(123)
...
123There was a problem hiding this comment.
That's good catch, and we should probably fix it. Note this was copied directly from the existing code which also has this bug:
| # Get latest __cuGetProcAddress_v2 | ||
| global __cuGetProcAddress_v2 | ||
| __cuGetProcAddress_v2 = dlsym(handle, 'cuGetProcAddress_v2') | ||
| if __cuGetProcAddress_v2 == NULL: | ||
| raise RuntimeError("Failed to get __cuGetProcAddress_v2") | ||
| _F_cuGetProcAddress_v2 = <__cuGetProcAddress_v2_T>__cuGetProcAddress_v2 |
There was a problem hiding this comment.
IIRC the old code has a path where we do dlsym to get unversioned symbols, is it no longer relevant?
There was a problem hiding this comment.
Yes, the old code had a path to fall back to dlsym if we can't load cuGetProcAddress_v2. This now assumes it's available and works. IIUC, the old code path was for old CTK's where that might not be available and we don't need it anymore. But I think the best way would be to have QA run this across a range of things. It would be great to drop that complexity if we could. The complexity is not just another branch of this that uses dlsym -- it's from re-creating the versioned symbol mapping that cuGetProcAddress_v2 handles for us. It's a big part of what the old generator did that doesn't yet exist in the new generator.
| cdef int err, driver_ver = 0 | ||
|
|
||
| # Load driver to check version | ||
| handle = LoadLibraryExW("nvcuda.dll", NULL, LOAD_LIBRARY_SEARCH_SYSTEM32) |
There was a problem hiding this comment.
Same reason not to update this now as for Linux.
This is a continuation of the work in #1900. Now adds
driverto the mix and bothnvrtcanddriverare generated from the "real" new generator.