Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make email.message.Message.__contains__ faster #100792

Closed
sobolevn opened this issue Jan 6, 2023 · 3 comments
Closed

Make email.message.Message.__contains__ faster #100792

sobolevn opened this issue Jan 6, 2023 · 3 comments
Assignees
Labels
performance Performance or resource usage topic-email type-feature A feature request or enhancement

Comments

@sobolevn
Copy link
Member

sobolevn commented Jan 6, 2023

Right now the implementation of Message.__contains__ looks like this:

def __contains__(self, name):
return name.lower() in [k.lower() for k, v in self._headers]

There are several problems here:

  1. We build intermediate structure (list in this case)
  2. We use list for in operation, which is slow

The fastest way to do check if actually have this item is simply by:

    def __contains__(self, name):
        name_lower = name.lower()
        for k, v in self._headers:
            if name_lower == k.lower():
                return True
        return False

We do not create any intermediate lists / sets. And we even don't iterate longer than needed.
This change makes in check twice as fast.

Microbenchmark

Before

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.40 us +- 0.14 us
pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.42 us +- 0.06 us

After

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 904 ns +- 55 ns
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 715 ns +- 24 ns

The second case is now twice as fast.
It probably also consumes less memory now, but I don't think it is very significant.

Importance

Since EmailMessage (a subclass of Message) is quite widely used by users and 3rd party libs, I think it is important to be included.

And since the patch is quite simple and pure-python, I think the risks are very low.

Linked PRs

@sobolevn sobolevn added type-feature A feature request or enhancement performance Performance or resource usage labels Jan 6, 2023
@sobolevn sobolevn self-assigned this Jan 6, 2023
sobolevn added a commit to sobolevn/cpython that referenced this issue Jan 6, 2023
@pochmann
Copy link
Contributor

pochmann commented Jan 7, 2023

We use list for in operation, which is slow

I'd say it's one of the fastest linear searches you can do.

What times do you get if you simply change the listcomp to a genexp, i.e., just change [/] to (/)?

return name.lower() in map(str.lower, self) might also be good.

Btw in "After," you switched the order of the two benchmarks, better use the same as in "Before".

@sobolevn
Copy link
Member Author

sobolevn commented Jan 7, 2023

What times do you get if you simply change the listcomp to a genexp, i.e., just change [/] to (/)?

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.71 us +- 0.04 us

and

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.79 us +- 0.02 us

return name.lower() in map(str.lower, self) might also be good.

Nope:

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.74 us +- 0.02 us
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.69 us +- 0.02 us

@pochmann
Copy link
Contributor

pochmann commented Jan 7, 2023

Hmm, I hoped the "from" search would be good. Odd that it's slower than the "missing" search in the map solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-email type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants