Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46218: Change long_pow() to sliding window algorithm #30319

Merged
merged 9 commits into from Jan 2, 2022

Conversation

tim-one
Copy link
Member

@tim-one tim-one commented Jan 1, 2022

For large exponents in long_pow(), use the sliding window algorithm instead.

Also boost the window size from 5 bits to 6, which should yield a modest but significant speedup for long exponents. The precomputed table remains the same size, though, because the sliding window algorithm only stores results for odd exponents.

long_pow() no longer requires that the number of bits in a CPython long digit be a multiple of 5. It no longer cares at all what the digit width is.

https://bugs.python.org/issue46218

For large exponents in long_pow(), use the sliding window algorithm instead.

Also boost the window size from 5 bits to 6, which should yield a modest but significant speedup for long exponents. The precomputed table remains the same size, though, because the sliding window algorithm only stores results for odd exponents.

long_pow() no longer requires that the number of bits in a CPython long digit be a multiple of 5. It no longer cares at all what the digit width is.
Lib/test/test_pow.py Show resolved Hide resolved
Lib/test/test_pow.py Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
tim-one added 3 commits Jan 1, 2022
Good catch! The code is clearer the new way too.
trailing zero logic entirely into ABSORB_PENDING. These
native int bit manipulations are dirt cheap in
comparison to the bigint squaring needed for each
exponent bit.

And boost the size of exponents tested to (probabilistically)
stress a greater mix of exponent bit patterns.
a million straight 1 bits, the timing difference seems insignicant.
So I think it better to cut the table size in half, to cut the
precomputation overhead time in half for "saner" (smaller
exponent) cases.
@tim-one
Copy link
Member Author

tim-one commented Jan 1, 2022

Note that I cut the window size back to 5 bits. Short explanation on the bpo report (nothing to do with the number of bits in a "digit").

Since the dynamic table of small powers needed is half the
size now, the overhead of trying to use the k-ary method has
been cut accordingly, which allows it to pay off at smaller
exponent bit lengths.
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Use shortcut initializer for the k-ary table.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
@tim-one tim-one merged commit 863729e into python:main Jan 2, 2022
11 checks passed
@tim-one tim-one deleted the windowpow branch Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants