Skip to content

gh-115704: Improve DJBX33A hash algorithm#115705

Closed
PeterYang12 wants to merge 1 commit into
python:mainfrom
PeterYang12:accelerate_DJBX33A
Closed

gh-115704: Improve DJBX33A hash algorithm#115705
PeterYang12 wants to merge 1 commit into
python:mainfrom
PeterYang12:accelerate_DJBX33A

Conversation

@PeterYang12

@PeterYang12 PeterYang12 commented Feb 20, 2024

Copy link
Copy Markdown

Accelerating python hash algorithm by "unoptimizing" it when using DJBX33A as hash algorithm.
See Daniel Lemire's blog post:
https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/
This idea has already been implemented in the PHP interpreter.

@bedevere-app

bedevere-app Bot commented Feb 20, 2024

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

1 similar comment
@bedevere-app

bedevere-app Bot commented Feb 20, 2024

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gpshead gpshead self-assigned this Feb 22, 2024
@bedevere-app

bedevere-app Bot commented Feb 27, 2024

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

1 similar comment
@bedevere-app

bedevere-app Bot commented Aug 25, 2024

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gpshead gpshead removed their assignment Aug 25, 2024

@picnixz picnixz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like you to address Serhiy's concern on the issue as well. And could we have the actual assembly code being generated actually (and it's comparison)?

Comment thread Python/pyhash.c Outdated
Comment on lines +172 to +176
hash = hash * 33 * 33 * 33 * 33 +
p[0] * 33 * 33 * 33 +
p[1] * 33 * 33 +
p[2] * 33 +
p[3];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can directly compute those values (namely the powers of 33), although the compiler should be able to optimize it as well. You could however add a small comment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. Done.

Comment thread Python/pyhash.c Outdated
Comment thread Python/pyhash.c Outdated
Comment on lines +180 to +190
if (len >= 2) {
if (len > 2) {
hash = hash * 33 * 33 * 33 +
p[0] * 33 * 33 +
p[1] * 33 +
p[2];
}
else {
hash = hash * 33 * 33 + p[0] * 33 + p[1];
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the compiler behaves differently if you use

if (len > 2) { ... }
else if (len == 2) { ... }
else if (len) { ... }

instead?

@cfbolz

cfbolz commented Aug 25, 2024

Copy link
Copy Markdown
Contributor

this PR is quite fundamentally different from the Lemire post/the PHP change. In PHP, the change was done in the implementation of the hash function for arbitrary lengths. In this PR, only the code for computing the hash of bytes with length <= 7 was changed. The hash function of arbitrary lengths is already using a direct multiplication in CPython, and is operating on several characters at a time.

This might still be a worthwhile change, but there should be really strong benchmarking results that show this.

@jxu

jxu commented Aug 26, 2024

Copy link
Copy Markdown

I support it only to clean up a little by getting rid of the ugly (IMO) C fall-through code and replacing it with a simple loop. I don't think the compiler will generate anything significantly different.

Accelerating python hash algorithm by "unoptimizing" it when using
DJBX33A as hash algorithm. See Daniel Lemire's blog post:
https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/

Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
@bedevere-app

bedevere-app Bot commented Aug 27, 2024

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot

python-cla-bot Bot commented Apr 6, 2025

Copy link
Copy Markdown

The following commit authors need to sign the Contributor License Agreement:

CLA signed

@python python deleted a comment Apr 7, 2025
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 15, 2026
@eendebakpt

Copy link
Copy Markdown
Contributor

I experimented a bit with the approach here (using godbolt), but found no real speedups. Also note that by default the code is not used as Py_HASH_CUTOFF ==0 (see sys.hash_info)

For the siphash13 that is used there is an optimization possible by pre-computing the v0, v1, v2, v3 from k0 and k1 (the k0 and k1 are set at initialization time). For small strings it makes an 5% difference. If anyone is interested, feel free to pursue that.

PR can be closed afaic

@github-actions github-actions Bot removed the stale Stale PR or inactive for long period of time. label Jun 6, 2026
@StanFromIreland

Copy link
Copy Markdown
Member

Thanks for checking @eendebakpt, additionally the CLA is unsigned and no benchmarks have been provided as requested, so I'm closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants