bpo-42673 prevent branch misprediction in round_size (used in rehash) #23833

jneb · 2020-12-18T09:36:41Z

Replace the loop in round_size using bit_length to prevent branch misprediction delays.

https://bugs.python.org/issue42673

Replace the loop in round_size using bit_length to prevent branch misprediction delays.

Forgot the #linclude...

jneb · 2020-12-18T10:06:20Z

The fact that test_asyncio failed on Windows x86 shouldn't have anything to do with this: the change is deep into the core, and if there actually is an error there, many more tests should fail.
Does anybody understand what is going on here?
The actual error is:

FAIL: test_sendfile_close_peer_in_the_middle_of_receiving (test.test_asyncio.test_sendfile.ProactorEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\lib\test\test_asyncio\test_sendfile.py", line 458, in test_sendfile_close_peer_in_the_middle_of_receiving
    self.run_loop(
AssertionError: ConnectionError not raised

After forcing a new test, all was OK. There is something fishy with this test ...

Updated comment. I hope the new build test doesn't fail. (That wasn't my fault!)

vstinner · 2020-12-18T10:39:55Z

Python/hashtable.c

-    return i;
+    // 1 << _Py_bit_length(s) is the smallest value k so that 1 << k > s
+    // subtract one in case s is an exact power of two, saving space
+    return 1 << _Py_bit_length(s - 1);


Can you please try _Py_SIZE_ROUND_DOWN() which could be even faster? It uses simple operations:

#define _Py_SIZE_ROUND_DOWN(n, a) ((size_t)(n) & ~(size_t)((a) - 1))

About _Py_bit_length(): it has a special case for 0, but s >= HASHTABLE_MIN_SIZE.

That's brilliant! I didn't know about that macro.
EDITED:
.... but, that macro doesn't do what we want: we want to round up to the nearest power of two, not to a multiple of a given power of two.
I've been wrecking my brain for a quick binary trick to do that, but this is the fastest one I could find.

For me, this macro is pure black magic... but oops, I picked the wrong macro, I was thinking at: _Py_SIZE_ROUND_UP().

/* Below "a" is a power of 2. */ /* Round down size "n" to be a multiple of "a". */ #define _Py_SIZE_ROUND_DOWN(n, a) ((size_t)(n) & ~(size_t)((a) - 1)) /* Round up size "n" to be a multiple of "a". */ #define _Py_SIZE_ROUND_UP(n, a) (((size_t)(n) + \ (size_t)((a) - 1)) & ~(size_t)((a) - 1))

Maybe _Py_SIZE_ROUND_UP() should be rewritten as a static inline function to ensure that (a-1) expression is only computed once.

vstinner · 2020-12-18T10:40:08Z

Python/hashtable.c

@@ -109,10 +110,9 @@ round_size(size_t s)
    size_t i;
    if (s < HASHTABLE_MIN_SIZE)


Micro-optimisation :-)

Suggested change

if (s < HASHTABLE_MIN_SIZE)

if (s <= HASHTABLE_MIN_SIZE)

This actually may make the code slower. The value passed to round_size will almost never be < HASHTABLE_MIN_SIZE, but it sometimes is equal.
The branch prediction will probably be faster if it can assume the condition is always false.

vstinner · 2020-12-18T10:41:25Z

@serhiy-storchaka: using _Py_SIZE_ROUND_DOWN() avoids the loop and might be faster, no? What do you think?

In the last patch, I forgot to remove an unused variable.

serhiy-storchaka · 2020-12-18T13:01:33Z

_Py_SIZE_ROUND_UP() does not have anything in common with this code. It returns the smallest number not less than n which is divisible by a. _Py_SIZE_ROUND_UP(42, 2) -> 42, _Py_SIZE_ROUND_UP(42, 4) -> 44, _Py_SIZE_ROUND_UP(42, 8) -> 48.

serhiy-storchaka · 2020-12-18T13:03:20Z

_Py_bit_length() works with unsigned long, but the argument of round_size() has type size_t.

jneb

That's a good point. So this would fail if the hashtable gets a number of entries that couldn't fit in an unsigned long, which is indeed possible in theory (but only for humongous computers: 8<<32 = 32 GB, for a single hash table. Would be a bit hard to test :-).
I think the best solution would be to fix the macro to handle this case.

jneb · 2020-12-18T14:06:33Z

_Py_bit_length() works with unsigned long, but the argument of round_size() has type size_t.

When looking at the definition of _Py_bit_length(), I am a afraid to copy it for size_t. What if size_t is equivalent to unsigned long? Or more accurately, what is the safe #if to write to make sure there is no double definition? Something like

#if SIZEOF_SIZE_T > SIZEOF_LONG
(copied code of Py_bit_length, for size_t)
#endif

or is there a better way?
EDITED:
That didn't work. I made a separate _Py_bit_length_size_t routine; I am not happy with the result as it looks ugly to repeat all this source code.

This is needed for the new version of the hashtable.c, in case the hashtable is extremely large (>32GB).

Apperently, you can't use sizeof in an #if

There should be a good way to check sizeof(size_t) > sizeof(unsigned long); I hope this is the one

Swapped a } and an #endif...

Made a separate routine _Py_bit_length_size_t since overloading doesn't seem to work

Use the separate _Py_bit_length_size_t so that round_size works in extreme cases of hashtables >32GB

Cast the 1 to size_t to get a proper shift

I changed one size_t too many

I am determined to get this to work!

vstinner · 2020-12-18T17:07:47Z

Include/internal/pycore_bitutils.h

+// Same routine as above, for size_t, if that is bigger than unsigned long
+// This means bitscanreverse (as used above) is not going to work
+static inline int
+_Py_bit_length_size_t(size_t x)


I would prefer to add a _Py_next_pow2() function which takes a size_t parameter. So we can use more efficient implementation when __builtin_clzl() and _BitScanReverse() are not available. Like:

static inline uint32_t next_pow2(uint32_t x) { x -= 1; x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); return x + 1; }

I suggest to require the argument to be >= 2.

This code is reference at http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 for example.

jneb · 2020-12-18T17:10:20Z

So it appears to work now, but I am not completely happy with the way the different types are specified. But at least we can be sure that the optimization is possible and works.
It won't make a very big difference, but the fact that we probably save a mispredicted branch for every hashtable expansion makes me happy.

jneb · 2020-12-18T17:21:34Z

I would be happy if somebody with more experience will have a good look at this.

serhiy-storchaka · 2020-12-18T17:32:57Z

I would want to see evidences that the proposed code is faster than the current code.

ghost

LGTM

jneb · 2020-12-22T22:52:01Z

I found exactly the same code in pyobject.c under the name calculate_keysize so this can be replaced by a call to the new routine.
Furthermore, it almost proves that it is the fastest way :-)

vstinner · 2020-12-23T03:20:33Z

I found exactly the same code in pyobject.c under the name calculate_keysize so this can be replaced by a call to the new routine. Furthermore, it almost proves that it is the fastest way :-)

calculate_keysize() is defined in Objects/dictobject.c. It's called when creating a new dict, when inserting an item in a dict, or when merging two dicts. It might be interesting, but so far, you didn't provide any micro-benchmark.

serhiy-storchaka · 2020-12-23T12:52:54Z

Similar code is used for dict, set, and other implementations of hashtable (one or two).

jneb · 2020-12-23T13:01:28Z

The unsigned long version in pycore_bitutils is even more optimized, with a 5 bit lookup; i would suggest to copy that code for the size_t version (as I did), and use it for all locations where rounding up to a two power is used.
I'm a bit nervous to do this myself as I am a core development newbie, and there are so many different types used in this code.

github-actions · 2021-01-23T01:00:46Z

This PR is stale because it has been open for 30 days with no activity.

iritkatriel · 2022-04-09T21:31:47Z

Closing as https://bugs.python.org/issue42673 has been rejected.

Update hashtable.c

09a3944

Replace the loop in round_size using bit_length to prevent branch misprediction delays.

the-knights-who-say-ni added the CLA signed label Dec 18, 2020

bedevere-bot added the awaiting review label Dec 18, 2020

blurb-it bot and others added 2 commits December 18, 2020 09:39

📜🤖 Added by blurb_it.

b97bb91

Update hashtable.c

5ee5fb5

Forgot the #linclude...

jneb changed the title ~~bpo-42673 prevent branch misprediction in round_size (using in rehash)~~ bpo-42673 prevent branch misprediction in round_size (used in rehash) Dec 18, 2020

Update hashtable.c

79aff94

Updated comment. I hope the new build test doesn't fail. (That wasn't my fault!)

vstinner reviewed Dec 18, 2020

View reviewed changes

Removed unused variable

5a9093d

In the last patch, I forgot to remove an unused variable.

jneb commented Dec 18, 2020

View reviewed changes

jneb added 9 commits December 18, 2020 16:22

Added a separate bit_length for size_t

c8dcf61

This is needed for the new version of the hashtable.c, in case the hashtable is extremely large (>32GB).

Fixed sizeof in #if

a17eaa7

Apperently, you can't use sizeof in an #if

Another attempt to fix

8847b28

There should be a good way to check sizeof(size_t) > sizeof(unsigned long); I hope this is the one

Silly typo fixed

73b0f7a

Swapped a } and an #endif...

Separate routine _Py_bit_length_size_t

676dcef

Made a separate routine _Py_bit_length_size_t since overloading doesn't seem to work

Separate _Py_bit_length_size_t

7808698

Use the separate _Py_bit_length_size_t so that round_size works in extreme cases of hashtables >32GB

Simple fix

9aaf5ae

Cast the 1 to size_t to get a proper shift

Fixed type of msb

c0fc493

I changed one size_t too many

Use 64 bit bsr

5ba4194

I am determined to get this to work!

vstinner reviewed Dec 18, 2020

View reviewed changes

ghost approved these changes Dec 22, 2020

View reviewed changes

bedevere-bot removed the awaiting review label Dec 22, 2020

bedevere-bot added the awaiting core review label Dec 22, 2020

github-actions bot added the stale Stale PR or inactive for long period of time. label Jan 23, 2021

iritkatriel closed this Apr 9, 2022

		@@ -109,10 +110,9 @@ round_size(size_t s)
		size_t i;
		if (s < HASHTABLE_MIN_SIZE)

Uh oh!

bpo-42673 prevent branch misprediction in round_size (used in rehash) #23833

bpo-42673 prevent branch misprediction in round_size (used in rehash) #23833

Uh oh!

Conversation

jneb commented Dec 18, 2020 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jneb commented Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

jneb Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

jneb Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner commented Dec 18, 2020

Uh oh!

serhiy-storchaka commented Dec 18, 2020

Uh oh!

serhiy-storchaka commented Dec 18, 2020

Uh oh!

jneb left a comment

Choose a reason for hiding this comment

Uh oh!

jneb commented Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

jneb commented Dec 18, 2020

Uh oh!

jneb commented Dec 18, 2020

Uh oh!

serhiy-storchaka commented Dec 18, 2020

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

jneb commented Dec 22, 2020

Uh oh!

vstinner commented Dec 23, 2020

Uh oh!

serhiy-storchaka commented Dec 23, 2020

Uh oh!

jneb commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 23, 2021

Uh oh!

iritkatriel commented Apr 9, 2022

Uh oh!

Uh oh!

jneb commented Dec 18, 2020 •

edited by bedevere-bot

Loading

jneb commented Dec 18, 2020 •

edited

Loading

jneb Dec 18, 2020 •

edited

Loading

jneb commented Dec 18, 2020 •

edited

Loading

jneb commented Dec 23, 2020 •

edited

Loading