-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-113947: Speed up Counter.__eq__ #113948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The main effect is perhaps from getting rid of a generator.
Please add a simple NEWS entry.
|
To @terryjreedy's point, I benchmarked a variety of permutations.
This ran 10,000 iterations each of a modified version of @keithasaurus's script, including a very large Benchmarkingimport string
from collections import Counter
from operator import itemgetter
from time import perf_counter_ns
ALPHA_100 = string.ascii_letters * 100
A_100 = "a" * len(ALPHA_100)
COUNTERS = (
(Counter(), Counter()),
(Counter(), Counter("abcdefghijklmnop")),
(Counter("abcdefghijklmnop"), Counter("abcdefghijklmnop")),
(Counter("abcdefghijklmnopppppp"), Counter("abcdefghijklmnop")),
(Counter("abcdefghijklmnop"), Counter("abcdefghijklmnopppppp")),
(Counter("aaaaaaaaaaaaaaaa"), Counter("abcdefghijklmnop")),
(Counter("abcdefghijklmnop"), Counter("aaaaaaaaaaaaaaaa")),
(Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])),
(Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10]), Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])),
(Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), Counter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10])),
(Counter(ALPHA_100), Counter(A_100)),
(Counter(A_100), Counter(ALPHA_100)),
)
EQS = [a == b for (a, b) in COUNTERS]
def benchmark(iterations, a, b):
start = perf_counter_ns()
for i in range(iterations):
a == b
return perf_counter_ns() - start
def run_bench(iterations):
total_duration = 0
for (a, b) in COUNTERS:
total_duration += benchmark(iterations, a, b)
outcome = "" if [a == b for (a, b) in COUNTERS] == EQS else "(incorrect!)"
print(f" total duration: {total_duration / 10**6:0>5.1f} ms {outcome}")
return total_duration
def eq_original(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
return all(self[e] == other[e] for c in (self, other) for e in c)
def eq_all_listcomp(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
return all([self[e] == other[e] for c in (self, other) for e in c])
def eq_all_explode(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
return all(self[k] == other[k] for k in self) and all(other[k] == self[k] for k in other)
def eq_all_items(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
return all(v == other[k] for k, v in self.items()) and all(v == self[k] for k, v in other.items())
def eq_all_map(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
return all(map((lambda t: t[1] == other[t[0]]), self.items())) and all(map((lambda t: t[1] == self[t[0]]), other.items()))
def eq_nested_loop(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for obj1, obj2 in ((self, other), (other, self)):
for k, v in obj1.items():
if v != obj2[k]:
return False
return True
def eq_seq_loop(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for k, v in self.items():
if v != other[k]:
return False
for k, v in other.items():
if v != self[k]:
return False
return True
def eq_nested_any(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for obj1, obj2 in ((self, other), (other, self)):
if any(v != obj2[k] for k, v in obj1.items()):
return False
return True
def eq_seq_any(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
if any(v != other[k] for k, v in self.items()):
return False
if any(v != self[k] for k, v in other.items()):
return False
return True
def eq_nested_loop_keys(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
if self.keys() != other.keys():
return False
for obj1, obj2 in ((self, other), (other, self)):
for k, v in obj1.items():
if v != obj2[k]:
return False
return True
def eq_seq_loop_keys(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
if self.keys() != other.keys():
return False
for k, v in self.items():
if v != other[k]:
return False
for k, v in other.items():
if v != self[k]:
return False
return True
def eq_nested_any_keys(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
if self.keys() != other.keys():
return False
for obj1, obj2 in ((self, other), (other, self)):
if any(v != obj2[k] for k, v in obj1.items()):
return False
return True
def eq_seq_any_keys(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
if self.keys() != other.keys():
return False
if any(v != other[k] for k, v in self.items()):
return False
if any(v != self[k] for k, v in other.items()):
return False
return True
def eq_nested_any_listcomp(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for obj1, obj2 in ((self, other), (other, self)):
if any([v != obj2[k] for k, v in obj1.items()]):
return False
return True
def eq_seq_any_listcomp(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if any([v != other[k] for k, v in self.items()]):
return False
if any([v != self[k] for k, v in other.items()]):
return False
return True
results = {}
func = name = None
for func in (func for name, func in globals().items() if name.startswith('eq_')):
name = func.__name__
print(name)
Counter.__eq__ = func
results[name[3:]] = run_bench(iterations=10**5)
longest_name = len(max(results, key=len))
print("\nSorted timings\n" + "=" * (longest_name+10))
for f, d in sorted(results.items(), key=itemgetter(1)):
print(f"{f: >{longest_name}}: {d / 10**6:0>5.1f} ms")The second fastest function is the one suggested in this PR. The fastest function improves on that time by 11-12% (for a total 52% speedup) by unrolling the loop: def __eq__(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for k, v in self.items():
if v != other[k]:
return False
for k, v in other.items():
if v != self[k]:
return False
return TrueI think we could also consider changing A |
|
@AA-Turner Yes, I actually had tried that version, but I wasn't sure two loops would be as welcome from a maintainability point of view. It may be worth noting there is a potential further "optimization" I couldn't quite find. def __eq__(self, other):
'True if all counts agree. Missing counts are treated as zero.'
if not isinstance(other, Counter):
return NotImplemented
for k, v in self.items():
if v != other[k]:
return False
# at this point we've already covered the keys common to both `self` and `other`,
# so we really only need to check that any keys that aren't in `self` are equal to zero
for k, v in other.items():
if k not in self and v != 0:
return False
return TrueHowever, it didn't show an improvement in performance -- every iteration is still doing a dict lookup and an equality check. I'm not sure if there's a way to improve it. I'll update to the two loops. |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
I updated to the two for loops. I've also updated the code to do use Two for loops two for loops + So, according to this benchmark, we'd be at about a 4x speedup overall. |
|
Another potential: return {k: v for k, v in self.items() if v} == {k: v for k, v in other.items() if v}I'm on mobile so can't test, but this may benefit due to the comprehension inlining in Python 3.12 and later. A |
I would be surprised if the inlining of comprehensions was fast enough to overcome a lack of short-circuiting, but definitely open to it if it works. |
|
@AA-Turner For the dict comprehension approach I got: 0: 0.31111598014831543
1: 0.8031868934631348
2: 1.4441249370574951
3: 1.4424488544464111
4: 0.8289532661437988
5: 0.9804871082305908
6: 0.9872369766235352
total duration: 6.797554016113281I expect the slowness is in part due to the need to instantiate new dicts, as well as no short-circuiting in the |
|
I think this is good to merge, will hold off for a couple of days for any concerns to be raised. A |
|
@keithasaurus just to note there's no need to frequently merge with HEAD unless there are conflicting changes (which is fairly rare) -- it takes up additional CI resource for extra commits etc, which we should attempt to avoid. A |
|
I just noticed that if you use |
|
@keithasaurus, please add this case in tests. |
Lib/collections/__init__.py
Outdated
| # so now we can just check that any keys that | ||
| # aren't in self are equal to zero | ||
| for k, v in other.items(): | ||
| if v and k not in self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if v and k not in self: | |
| if v != 0 and k not in self: |
|
@HarryLHW good catch. I did the following:
|
Lib/collections/__init__.py
Outdated
| @@ -788,7 +788,21 @@ def __eq__(self, other): | |||
| 'True if all counts agree. Missing counts are treated as zero.' | |||
| if not isinstance(other, Counter): | |||
| return NotImplemented | |||
| return all(self[e] == other[e] for c in (self, other) for e in c) | |||
|
|
|||
| if super().__eq__(other): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__eq__() can return NotImplemented, and NotImplemented should not be used in boolean context (it emits a warning).
If you say that dict.__eq__() never returns NotImplemented for arguments that are instances of dict subclasses, than use dict.__eq__().
I wonder what effect this optimization has on comparison of two large Counters with equal size but slightly different keys?
c1 = Counter(large_dict)
c2 = Counter(large_dict)
c1[new_k1] = 0
c2[new_k2] = v # may be 0 or not 0There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you say that dict.eq() never returns NotImplemented for arguments that are instances of dict subclasses, than use dict.eq().
Because of the return NotImplemented for non-Counters, I believe we only need to be concerned with the case that dict.__eq__ works for two Counter instances. I'm not able to think of a situation in which this would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
class A(dict):
def __eq__(s, o): return NotImplemented
class B(Counter, A): passThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a little test script. Please feel free to probe further if you have other ideas. Both main and this branch return the same results.
from collections import Counter
class A(dict):
def __eq__(s, o): return NotImplemented
class B(Counter, A):
pass
class C(A, Counter):
pass
a = A()
b = B()
c = C()
print(a == b)
print(b == a)
print(c == b)
print(b == c)
print(a == c)
print(c == a)
print({} == a)
print(a == {})
print({} == b)
print(b == {})
print({} == c)
print(c == {})result
False
False
True
True
False
False
True
True
True
True
True
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> print(c == b)
/home/serhiy/py/cpython/Lib/collections/__init__.py:792: DeprecationWarning: NotImplemented should not be used in a boolean context
if super().__eq__(other):
True
>>> print(b == c)
/home/serhiy/py/cpython/Lib/collections/__init__.py:792: DeprecationWarning: NotImplemented should not be used in a boolean context
if super().__eq__(other):
TrueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simple solution would be just to check is True rather than the implicit bool. Does that meaningfully alter the performance gains here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AA-Turner No, it's essentially unchanged, so I added the explicit is True. Results from the benchmark here after the change are:
0: 0.09299182891845703
1: 0.20043516159057617
2: 0.08538293838500977
3: 0.7197530269622803
4: 0.719200611114502
5: 0.017891883850097656
total duration: 1.8356554508209229
This is presumably because is True is only done once per call -- it's not part of any iteration.
|
Please benchmark also with large Counters (thousands or tens of thousands of items, or more). Do it also with tuple keys (they have more expensive hash and comparison). |
The results of this are in line with what we've seen in the above benchmarks (code below). main branch this branch The code: from collections import Counter
from copy import copy
from typing import Any
from time import time
import urllib.request
with urllib.request.urlopen('https://www.usconstitution.net/const.txt') as f:
text = f.read().decode('utf-8')
constitution_counter = Counter(text.split())
constitution_counter_1 = copy(constitution_counter)
constitution_counter_1["word_not_in_counter"] = 1
constitution_zero_equal = copy(constitution_counter)
constitution_zero_equal["word_not_in_counter"] = 0
tuple_counter = Counter()
split_text = text.split()[:2000]
for i, word in enumerate(split_text):
try:
tuple_counter[(word, split_text[i+1])] += 1
except IndexError:
tuple_counter[(word,)] += 1
tuple_counter_1 = copy(tuple_counter)
tuple_counter_1[(split_text[998], split_text[999])] += 1
class CounterSubClass(Counter):
pass
subclass_counter = CounterSubClass()
non_subclass_counter = Counter()
for word in split_text[:1000]:
subclass_counter[word] += 1
non_subclass_counter[word] += 1
def benchmark(iterations: int, a: Counter[Any], b: Counter[Any], expected_result: bool) -> float:
assert (a == b) is expected_result
start = time()
for i in range(iterations):
a == b
return time() - start
total_duration = 0.0
for i, (a, b, expected_result) in enumerate([
(tuple_counter, tuple_counter, True),
# unequal at the end
(tuple_counter, tuple_counter_1, False),
# long equal
(constitution_counter, constitution_counter, True),
# long equal but will super.__eq__ will return False
(constitution_counter, constitution_zero_equal, True),
# long unequal (at the very end)
(constitution_counter, constitution_counter_1, False),
# equivalence between subclasses using __dict__.eq
(subclass_counter, non_subclass_counter, True)
]):
result = benchmark(5000, a, b, expected_result)
print(f"{i}:", result)
total_duration += result
print("total duration:", total_duration) |
|
FWIW, I greatly prefer the current code which is self-evidently correct, easy to maintain, and easy to explain to others. Also, the super call can't be predicted in advance where it will go or what is will do because it is the MRO of a subclass that determines that actual search order. If the small speed-up comes from removing the generator expression, I am dubious becauae that feels like just chasing the current state of CPython where the performance is in flux and the situation may reverse in the future as other optimizations kick in. Over the long-term, the cleanest, shortest, clearest, and most idiomatic code tends to win. I recommend that we not do this. We all have limits on how unclean we're willing the make code to save a few clock cycles. In this case, I'm not willing to forgo the current succinct clear code. This is doubly true because in practice this method is hardly ever used, and it is unlikely that any real program would ever benefit. |
|
@rhettinger I agree with a lot of the points. The first code change I proposed was simpler because I thought it was more maintainable: def __eq__(self, other):
if not isinstance(other, Counter):
return NotImplemented
for obj1, obj2 in ((self, other), (other, self)):
for k, v in obj1.items():
if v != obj2[k]:
return False
return TrueTo my understanding this is faster because of primarily because of removing the generator, as you mentioned, but also because it does fewer dict lookups -- iterating over I just pushed up a version of the code that is the simpler original PR code, but also includes the
|
|
My preference is to leave the code in its current state which is clean, simple, explainable, and reasonably performant.
I don't see any issue with a nested-for but it really bugs you, it is easy to replace it with |
|
Thanks for the suggestion, but I will decline. This isn't a speed-critical method that warrants heroic efforts. I greatly prefer the current code which reads like a definition of what the The current code only depends on |
I was curious how much of the speed-up from "removing the generator expression" comes from no longer having the generator pointlessly send The three candidates: def generator(xs):
return all(x for x in xs)
def loop(xs):
for x in xs:
if not x:
return False
return True
def generator_no_pointless_Trues(xs):
return all(False for x in xs if not x)Benchmark with That said, it's less extreme if there are fewer values or if there's an early false value. And of course readability suffers, and I prefer the non-optimized generator here . Benchmark scriptdef generator(xs):
return all(x for x in xs)
def loop(xs):
for x in xs:
if not x:
return False
return True
def generator_no_pointless_Trues(xs):
return all(False for x in xs if not x)
funcs = [generator, loop, generator_no_pointless_Trues]
from timeit import timeit
from statistics import mean, stdev
import sys
import random
# Correctness
xss = [[]]
for _ in range(5):
for xs in xss:
assert len({f(xs) for f in funcs}) == 1
xss = [xs + [b] for xs in xss for b in (False, True)]
# Speed
xs = [True] * 1000
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e6 for t in sorted(times[f])[:10]]
return f' {mean(ts):4.1f} ± {stdev(ts):3.1f} μs '
for _ in range(100):
random.shuffle(funcs)
for f in funcs:
t = timeit(lambda: f(xs), number=100) / 100
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print('\nPython:', sys.version) |
What I did
./configure --enable-optimizationsBenchmark
(running on an m1 Mac -- code is pasted below)
main branch
(format is
<iteration index>: <duration in seconds>)PR feature branch
Benchmark code