bpo-46192: Optimize builtin functions min() and max() #30286

colorfulappl · 2021-12-29T10:10:11Z

Builtin functions min() and max() are labeled as METH_VARARGS | METH_KEYWORDS, which use tp_call calling convention.
After changing their label to METH_FASTCALL | METH_KEYWORDS, they can be invoked by vectorcall.

This optimization simplifies parameter passing and avoids creation of temporary tuple while parsing arguments, brings about up to 200%+ speed up on microbenchmarks.

faster-cpython/ideas#199

https://bugs.python.org/issue46192

erlend-aasland · 2021-12-29T11:36:56Z

If we're touching these, it would make sense to convert them to Argument Clinic, now that *args support is implemented (9af34c9).

cpython/Python/bltinmodule.c

Lines 1818 to 1823 in 77195cd

    
           /* AC: cannot convert yet, waiting for *args support */ 
        
           static PyObject * 
        
           builtin_min(PyObject *self, PyObject *args, PyObject *kwds) 
        
           { 
        
               return min_max(args, kwds, Py_LT); 
        
           }

cpython/Python/bltinmodule.c

Lines 1835 to 1840 in 77195cd

    
           /* AC: cannot convert yet, waiting for *args support */ 
        
           static PyObject * 
        
           builtin_max(PyObject *self, PyObject *args, PyObject *kwds) 
        
           { 
        
               return min_max(args, kwds, Py_GT); 
        
           }

colorfulappl · 2021-12-30T03:27:24Z

I wrote an "Argument Clinic" version here:
colorfulappl@29b9559

But seems it is not as fast as current "without Argument Clinic" version:
colorfulappl@a9413ab

Result of microbench:

code snippet	with AC	w/o AC
max(1, 2)	1.11x faster	3.25x faster
max([1, 2])	1.14x faster	1.41x faster
max((1, ), (2, ), key=lambda x: x[0])	1.89x faster	1.85x faster
max([(1, ), (2, )], key=lambda x: x[0])	1.73x faster	1.52x faster
max([], default=-1)	2.39x faster	2.61x faster
max(1, 2, 3, 4, 5)	1.02x faster	2.46x faster

On the most commonly used case, "max(a, b [, ...])", there is not noticeable speed up when we use AC, especially when multiple arguments are passed (the last case).

I noticed that in 9af34c9 , the varargs on stack are packed to tuple before passed to callee, and callee obtains each argument by access the tuple's elements.
This slows down the function invocation.

colorfulappl · 2021-12-30T03:44:18Z

One goal of fastcall is to avoid the creation of a temporary tuple to pass positional arguments (https://bugs.python.org/issue29259).

IMHO, the process of pack/unpack arguments to a tuple is unnecessary in 9af34c9 .
Perhaps it's better to pass an Object * const * pointer which points to the first argument and a nargs integer to indicate how many positional arguments are passed when we are processing varargs.

I would have a try, then may open another issue if necessary.

erlend-aasland · 2021-12-30T11:52:19Z

IMHO, the process of pack/unpack arguments to a tuple is unnecessary in 9af34c9 . Perhaps it's better to pass an Object * const * pointer which points to the first argument and a nargs integer to indicate how many positional arguments are passed when we are processing varargs.

I would have a try, then may open another issue if necessary.

Yes, such a change demands a separate issue / PR. It would be nice if you could add proof-of-concept benchmark results when you present the idea in the new bpo issue. I can add the relevant core devs on the nosy list, if you want.

Keep in mind that readability and maintainability weigh very heavy when considering a PR; that also goes for optimisation changes.

erlend-aasland · 2021-12-30T12:10:03Z

I wrote an "Argument Clinic" version here: colorfulappl@29b9559

While that change certainly looks better, it is buggy; it does not heed the key function. With that in mind, the benchmarks you posted in #30286 (comment) are obviously wrong; the benchmarks with the key keyword are not correct.

sweeneyde · 2021-12-30T21:22:07Z

I wonder if Argument Clinic Itself could be updated to convert *args to vectorcall/fastcall if some kind of flag appears in the AC declaration

sweeneyde · 2021-12-31T05:41:36Z

*args support was added to AC in #18609

@isidentical any thoughts on removing the tuple creation in some cases, or with some flag?

erlend-aasland · 2021-12-31T07:52:18Z

*args support was added to AC in #18609

@isidentical any thoughts on removing the tuple creation in some cases, or with some flag?

I suggest to move that discussion to bpo-20291.

colorfulappl · 2021-12-31T10:21:06Z

I would have a try, then may open another issue if necessary.

I opened a new issue (https://bugs.python.org/issue46212),
and made a PR (#30312).

colorfulappl · 2021-12-31T10:34:28Z

the benchmarks you posted in #30286 (comment) are obviously wrong; the benchmarks with the key keyword are not correct.

Thanks for revising my mistake. I'v seen the new results after your amend.
Actually, when the key function is empty, the "AC version" runs faster.

MaxwellDupre

Tested with 3.10.2 and 3.11.0a5 resp; the results:

108	98.6
164	164

150	367

192	478

432	638

702	1.11

482	891

745	1.4

288	297

So doesn't look worth it to me.
Yes the 3.11.0a5 is not optimised and may be better when full release. You may like to wait for later release.

Builtin functions min() and max() now use METH_FASTCALL

a9413ab

This comment has been minimized.

Sign in to view

the-knights-who-say-ni added the CLA not signed label Dec 29, 2021

bedevere-bot added the awaiting review label Dec 29, 2021

colorfulappl changed the title ~~bpo-46192: Builtin functions min() and max() now use METH_FASTCALL~~ bpo-46192: Optimize builtin functions min() and max() Dec 29, 2021

AlexWaygood added the performance Performance or resource usage label Dec 29, 2021

AlexWaygood requested a review from serhiy-storchaka Dec 29, 2021

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Jan 20, 2022

MaxwellDupre reviewed Feb 18, 2022

View changes

colorfulappl mannequin mentioned this pull request Apr 30, 2022

Optimize builtin functions min() and max() #90350

Open

serhiy-storchaka added the expert-argument-clinic label Apr 30, 2022

ezio-melotti removed the CLA signed label Jul 13, 2022

larryhastings mentioned this pull request Apr 10, 2022

Argument Clinic should understand *args and **kwargs parameters #64490

Open

bpo-46192: Optimize builtin functions min() and max() #30286

bpo-46192: Optimize builtin functions min() and max() #30286

colorfulappl commented Dec 29, 2021 •

edited

This comment has been minimized.

erlend-aasland commented Dec 29, 2021 •

edited

colorfulappl commented Dec 30, 2021

colorfulappl commented Dec 30, 2021

erlend-aasland commented Dec 30, 2021 •

edited

erlend-aasland commented Dec 30, 2021 •

edited

sweeneyde commented Dec 30, 2021

sweeneyde commented Dec 31, 2021

erlend-aasland commented Dec 31, 2021 •

edited by bedevere-bot

colorfulappl commented Dec 31, 2021

colorfulappl commented Dec 31, 2021

MaxwellDupre left a comment

bpo-46192: Optimize builtin functions min() and max() #30286

Are you sure you want to change the base?

bpo-46192: Optimize builtin functions min() and max() #30286

Conversation

colorfulappl commented Dec 29, 2021 • edited

This comment has been minimized.

erlend-aasland commented Dec 29, 2021 • edited

colorfulappl commented Dec 30, 2021

colorfulappl commented Dec 30, 2021

erlend-aasland commented Dec 30, 2021 • edited

erlend-aasland commented Dec 30, 2021 • edited

sweeneyde commented Dec 30, 2021

sweeneyde commented Dec 31, 2021

erlend-aasland commented Dec 31, 2021 • edited by bedevere-bot

colorfulappl commented Dec 31, 2021

colorfulappl commented Dec 31, 2021

MaxwellDupre left a comment

colorfulappl commented Dec 29, 2021 •

edited

erlend-aasland commented Dec 29, 2021 •

edited

erlend-aasland commented Dec 30, 2021 •

edited

erlend-aasland commented Dec 30, 2021 •

edited

erlend-aasland commented Dec 31, 2021 •

edited by bedevere-bot