Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the hash value of fractions #96465

Closed
rhettinger opened this issue Aug 31, 2022 · 4 comments
Closed

Cache the hash value of fractions #96465

rhettinger opened this issue Aug 31, 2022 · 4 comments
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

rhettinger commented Aug 31, 2022

In both 3.10 and 3.11, hashing a small fraction is 20x slower than for a decimal. For fractions with large denominators, it is slower still.

We could sacrifice the space for one slot to cache the hash value:

__slots__ = ('_numerator', '_denominator', '_hash')

def __hash__(self):
    try:
        return self._hash
    except AttributeError:
        self.hash = ... # current code goes here
    return self._hash     

Another approach would be to have a single, size limited cache for the computation:

def __hash__(self):

    @lru_cache(maxsize=1 << 16)
    def current_code(numerator, denominator):
        try:
            dinv = pow(denominator, -1, _PyHASH_MODULUS)
        except ValueError:
            # ValueError means there is no modular inverse.
            hash_ = _PyHASH_INF
        else:
            hash_ = hash(hash(abs(numerator)) * dinv)
        result = hash_ if numerator >= 0 else -hash_
        return -2 if result == -1 else result

    return current_code(self._numerator, self._denominator)

Example of code that would benefit:

@lru_cache(maxsize=100_000)
def pabs(n: Fraction | int) -> Fraction:
    "P-adic absolute value"
    ...

for x, y in product(pool, repeat=2):            # pool is a set of fractions used as test values
    assert (pabs(x)==0) == (x==0)               # |x|=0 if and only if x=0
    assert pabs(x * y) == pabs(x) * pabs(y)     # |xy| = |x||y|
    assert pabs(x + y) <= pabs(x) + pabs(y)     # |x + y| ≤ |x| + |y| (Triangle Inequality)
    assert pabs(x + y) <= max(pabs(x), pabs(y)) # Strong triangle inequality
    assert pabs(x) == pabs(y) or pabs(x + y) == max(pabs(x), pabs(y)) # Lemma 2.1: If |x| ≠ |y|, |x+y| = max{|x|, |y|}
@rhettinger rhettinger added the type-feature A feature request or enhancement label Aug 31, 2022
@AlexWaygood AlexWaygood added the performance Performance or resource usage label Aug 31, 2022
@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Aug 31, 2022

See also #58683. The similar patch for Decimal was rejected.

@gvanrossum gvanrossum added the pending The issue will be closed if no feedback is provided label Sep 1, 2022
@gvanrossum
Copy link
Member

gvanrossum commented Sep 1, 2022

Classic time vs. space trade-off. How common is hashing Fractions? (The example seems a bit special.) If we decide to do this, I'd prefer the version using an LRU cache, so that someone creating a large number of Fractions they aren't hashing doesn't have to pay the price. Did you think about the cache size much or is 65k just your go-to cache size value these days?

FWIW the Decimal patch for Python was rejected. The C version seems all right, so the problem would only affect the tiny minority of people who can't use the C Decimal version. So that's not comparable here.

@matthiasgoergens
Copy link
Contributor

matthiasgoergens commented Sep 6, 2022

@rhettinger Do you have have benchmark data?

@rhettinger
Copy link
Contributor Author

rhettinger commented Sep 7, 2022

Baseline

% ~/cpython/python.exe -m timeit -s 'from fractions import Fraction as F' -s 'f = F(375, 34)' 'hash(f)'
500000 loops, best of 5: 479 nsec per loop

With caching

% git checkout fraction_hash_cache
Switched to branch 'fraction_hash_cache'
% ~/cpython/python.exe -m timeit -s 'from fractions import Fraction as F' -s 'f = F(375, 34)' 'hash(f)'
2000000 loops, best of 5: 119 nsec per loop

miss-islington pushed a commit that referenced this issue Sep 8, 2022
@zware zware removed the pending The issue will be closed if no feedback is provided label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants