Release GIL during msgpack packing operations for true thread parallelism#1
Draft
Release GIL during msgpack packing operations for true thread parallelism#1
Conversation
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Release GIL in Cython extension for parallel packing
Release GIL during large payload packing operations
Feb 12, 2026
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Release GIL during large payload packing operations
Add asyncio GIL release benchmarks
Feb 13, 2026
- All benchmarks now use os.cpu_count() to determine thread/worker counts - Primary benchmark uses CPU count threads (prevents saturation) - Intensive benchmark uses CPU count * 2 workers (stress test) - Example uses CPU count * 2 concurrent requests (realistic load) - Updated documentation to reflect CPU-based scaling - Prevents thread pool overload and event loop latency issues Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Add asyncio GIL release benchmarks
Use CPU core count for asyncio benchmark thread scaling
Feb 13, 2026
- Created LATENCY_ANALYSIS.md with detailed latency trade-off analysis - Updated ASYNCIO_BENCHMARK_RESULTS.md with latency considerations - Updated ASYNCIO_BENCHMARK_INDEX.md with latency warnings and guidance - Updated EXECUTIVE_SUMMARY.md to address "what about the latency?" question - Explains why P99 latency can be 2-3x worse with GIL release - Provides decision matrix for when to use GIL release vs small payloads - Includes mitigation strategies and use case guidance Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Use CPU core count for asyncio benchmark thread scaling
Document latency trade-offs in GIL release implementation
Feb 13, 2026
- Modified msgpack/_packer.pyx to always release GIL for all payload sizes - Removed NOGIL_THRESHOLD constant (was 1024 bytes) - Simplified code by removing conditional GIL release logic - All 5 locations now unconditionally release GIL: 1. bytes/bytearray packing 2. unicode/str packing 3. ExtType packing 4. memoryview packing 5. pack_ext_type method - Fixed pyproject.toml license format issue - All 123 tests pass - Benchmarks show improved parallelism for small payloads (3.01x vs 0.8x) Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
- Updated GIL_RELEASE_IMPLEMENTATION.md to reflect no threshold - Updated LATENCY_ANALYSIS.md with new benchmark results - Documented excellent parallelism: 3.01x-3.84x with 4 threads - Documented low latencies: 57µs avg, 566-594µs P99 - Removed outdated information about 1KB threshold - Simplified guidance - GIL release beneficial for almost all use cases Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Document latency trade-offs in GIL release implementation
Remove size threshold for GIL release - always release for all payloads
Feb 13, 2026
- Created benchmark_gil_vs_nogil.py: Compares current (GIL released) vs simulated original (GIL held) - Created benchmark_gil_detailed.py: Comprehensive comparison across payload sizes - Created benchmark_gil_comparison.py: Framework for future true comparison - Created _packer_nogil.pyx: Reference implementation without GIL release (for future builds) - Demonstrates throughput improvements from GIL release (9-14%) - Shows methodology for comparing GIL release impact Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Remove size threshold for GIL release - always release for all payloads
Release GIL during msgpack packing to enable thread parallelism
Feb 13, 2026
- Updated benchmark_asyncio_parallel.py to compare three approaches: 1. Small payloads with asyncio.to_thread (GIL held) 2. Large payloads with asyncio.to_thread (GIL released) 3. Direct async calls without threading (baseline) - Added async_msgpack_work_direct() function for direct async execution - Added direct_async_benchmark() to measure direct async performance - Updated main() to run all three benchmarks and compare results - Results show direct async has highest throughput (156k ops/sec) and best async responsiveness (4138 pings/sec) - Provides clear guidance on when to use each approach - Demonstrates threading overhead vs parallelism trade-offs Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Co-authored-by: Rexhaif <5154447+Rexhaif@users.noreply.github.com>
Copilot
AI
changed the title
Release GIL during msgpack packing to enable thread parallelism
Release GIL during msgpack packing operations for true thread parallelism
Feb 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
msgpack-python's C extension holds the GIL throughout packing/unpacking, preventing parallel execution even though
memcpyof payload bodies is pure C and doesn't touch Python objects.Core Changes
msgpack/pack.h: ReplacedPyMem_Reallocwithreallocto enable GIL-free buffer allocation. Error handling moved to caller with GIL held.msgpack/_packer.pyx:malloc/freefromPyMem_*allocatorsmsgpack_pack_raw_bodyasnogil-safewith nogil:blocks for bytes, str, memoryview, ExtTypemsgpack/_unpacker.pyx: Switched to libc allocators for consistency (no GIL release yet - unpacker creates Python objects per element).Performance
Thread parallelism (4 cores):
Asyncio throughput with
asyncio.to_thread():Latency Trade-off
P99 latency can increase 2-3x during large payload operations since event loop must wait for
memcpycompletion. Acceptable for batch/throughput workloads; keep payloads small for latency-critical applications.Safety
nogilblockmsgpack_packeris C struct (no Python API calls in nogil section)Usage Pattern
Benchmarks
Comprehensive benchmarks added:
benchmark_asyncio_parallel.py: Three-way comparison (small/large threading, direct async)benchmark_gil_detailed.py: GIL vs no-GIL comparison across payload sizesAll existing tests pass. No API changes.
Original prompt
Goal
Release the GIL in the Cython C extension where it is safe to do so, allowing true parallelism for CPU-bound packing operations involving large payloads when using Python threads.
Background
Currently, msgpack-python's C extension never releases the GIL during packing or unpacking. While most of the code is deeply intertwined with Python C API calls (type checks, object creation, etc.) and must hold the GIL, there are specific phases — particularly the
memcpyof raw payload bodies — that operate purely on C buffers and don't touch any Python objects. These can safely run without the GIL.The biggest win is in
_pack_innerin_packer.pyx, where largebytes,str,memoryview, andextpayloads are copied into the packer's internal buffer viamsgpack_pack_raw_body(). For large payloads (e.g. serializing many MB-sized binary blobs from multiple threads), releasing the GIL around thesememcpycalls enables true parallel execution.Architecture of the Change
1.
msgpack/pack.h— Switch buffer allocator fromPyMem_RealloctoreallocThe
msgpack_pack_writefunction currently usesPyMem_Realloc(which requires the GIL) andPyErr_NoMemory(). To allowmsgpack_pack_raw_bodyto be called without the GIL, the internal buffer must use standard Cmalloc/realloc/freeinstead.Change
msgpack_pack_writefrom:To:
2.
msgpack/_packer.pyx— Switch allocator and addnogilsectionsa) Switch
__cinit__/__dealloc__allocator:PyMem_Malloc→mallocin__cinit__PyMem_Free→freein__dealloc__from libc.stdlib cimport malloc, realloc, freeat the topfrom libc.string cimport memcpyat the topb) Declare
msgpack_pack_raw_bodyasnogil-safe:Add a separate
nogildeclaration block:Note: keep the original declarations too for the functions that DO need the GIL (they call
PyErr_NoMemoryon realloc failure paths). Onlymsgpack_pack_raw_bodyneeds thenogildeclaration since its internalmsgpack_pack_writenow uses plainreallocinstead ofPyMem_Realloc.Actually, after the pack.h change (removing PyMem_Realloc and PyErr_NoMemory), ALL the pack functions become GIL-free. So the cleaner approach is:
cdef extern from "pack.h":block to addnogilto ALL function declarationsc) Add
nogilblocks around large raw body writes in_pack_inner:Define a threshold constant:
For the
bytes/bytearraypacking path (around line 185-191), change:Apply the same pattern to:
unicode/strpacking path (around line 203-204)memoryviewpacking path (around line 239-240)ExtTypepacking path (around line 220)d) Handle realloc failure with GIL:
After each
msgpack_pack_raw_bodycall (or any pack call that could trigger realloc), check the return value. If it returns -1 (realloc failed), raiseMemoryErrorsince we now have the GIL back:3.
msgpack/_unpacker.pyx— Switch allocator for consistencyChange the
Unpackerclass to also usemalloc/freeinstead ofPyMem_Malloc/PyMem_Freefor its internal buffer, for consistency. The unpacker'sunpack_executefunction creates Python objects at every step so we can't release the GIL there, but switching the allocator keeps the codebase consistent and opens the door for future work.from libc.stdlib cimport malloc, realloc, freeat the top__init__:self.buf = <char*>malloc(read_size)instead ofPyMem_Malloc__dealloc__:free(self.buf)instead ofPyMem_Freeappend_buffer:new_buf = <char*>realloc(...)or keep usingmalloc+memcpy+freepattern but with libc versions4. Important Notes
except -1Cython exception spec on the pack functions won't conflict withnogilas long as we capture the return value and check it after re-acquiring the GIL.with nogil:block must only contain C-leve...This pull request was created from Copilot chat.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.