Potential Performance Improvements #92084

be-thomas · 2022-04-30T14:26:31Z

No description provided.

cpython-cla-bot · 2022-04-30T14:26:33Z

The following commit authors need to sign the Contributor License Agreement:

thomasbenardo96@gmail.com

Click the button to sign:

bedevere-bot · 2022-04-30T14:26:34Z

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

be-thomas · 2022-04-30T14:39:05Z

I'm porting the html.parser along with the ParserBase class to Lua.
I would love to submit pull request on every performance improvement opportunity that I find.

corona10

Please create an issue first and provide the proper benchmark for the optimization.

bedevere-bot · 2022-04-30T16:14:24Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Akuli · 2022-04-30T16:19:10Z

Lib/_markupbase.py

                else:
                    return -1
-                while rawdata[j:j+1].isspace():
+                while rawdata[j].isspace():


If j becomes len(rawdata), this will now fail with IndexError. Previously the loop just stopped and it went into the if statement below.

Ok, that makes sense.
It would require two checks to make this behave properly (j < len(rawdata) and rawdata[j].isspace()).
performance improvement for this is not worth the complexity

bedevere-bot · 2022-04-30T17:14:29Z

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

be-thomas · 2022-04-30T17:20:23Z

There is a lot of wasteful CPU code in this fashion (seen multiple times) :-

if ")" in rawdata[j:]:
    j = rawdata.find(")", j) + 1

search performed in string twice, even if we could just do it once.
Consider the strings immutable, then only every slice a new string is created on the fly.
moreover, the slicing done at rawdata[j:], is bound to be quite expensive depending on the size of the string.
We could eliminate slicing altogether and only have one find operation in this style.

RPAREN_pos = rawdata.find(")", j)
if find_RPAREN != -1:
    j = RPAREN_pos + 1

I'm new to Open Source Code Contributions. Would love to learn from other's coding style & know other's point of view.

ezio-melotti · 2022-04-30T20:05:27Z

I'm porting the html.parser along with the ParserBase class to Lua. I would love to submit pull request on every performance improvement opportunity that I find.

You should create an issue to discuss your changes, and possibly several PRs linked to the issue for the optimizations you find. The optimizations should also be confirmed by benchmarks whenever possible. I should have around some code I used to test/benchmark that I might publish if you think it might be useful.

While you are at it, it would also be useful to improve testing (this will be especially useful for you if you are validating your parser against the HTMLParser test suite) and possibly helpreview/fix related HTMLParser issues.

be-thomas · 2022-05-01T07:46:25Z

I have created an issue #92088
Also, I haven't done much automated testing.
Not sure how to make the benchmarks. Do I have to prepare a few html files and check it's parsing order?
Any resources would be really helpful.

Potential Performance Improvements

f5af6e3

be-thomas requested a review from ezio-melotti as a code owner Apr 30, 2022

bedevere-bot added the awaiting review label Apr 30, 2022

corona10 requested changes Apr 30, 2022

View changes

bedevere-bot removed the awaiting review label Apr 30, 2022

bedevere-bot added the awaiting changes label Apr 30, 2022

Akuli reviewed Apr 30, 2022

View changes

Update _markupbase.py

c45b527

be-thomas mentioned this pull request Apr 30, 2022

ParserBase could be optimized #92088

Open

ezio-melotti self-assigned this May 1, 2022

AlexWaygood added the performance label May 4, 2022

python / cpython Public

Potential Performance Improvements #92084

Potential Performance Improvements #92084

be-thomas commented Apr 30, 2022

cpython-cla-bot bot commented Apr 30, 2022 •

edited

bedevere-bot commented Apr 30, 2022

be-thomas commented Apr 30, 2022

corona10 left a comment •

edited

bedevere-bot commented Apr 30, 2022

Akuli Apr 30, 2022

be-thomas Apr 30, 2022

bedevere-bot commented Apr 30, 2022

be-thomas commented Apr 30, 2022 •

edited

ezio-melotti commented Apr 30, 2022

be-thomas commented May 1, 2022

python / cpython Public

Potential Performance Improvements #92084

Are you sure you want to change the base?

Potential Performance Improvements #92084

Conversation

be-thomas commented Apr 30, 2022

cpython-cla-bot bot commented Apr 30, 2022 • edited

bedevere-bot commented Apr 30, 2022

be-thomas commented Apr 30, 2022

corona10 left a comment • edited

bedevere-bot commented Apr 30, 2022

Akuli Apr 30, 2022

Choose a reason for hiding this comment

be-thomas Apr 30, 2022

Choose a reason for hiding this comment

bedevere-bot commented Apr 30, 2022

be-thomas commented Apr 30, 2022 • edited

ezio-melotti commented Apr 30, 2022

be-thomas commented May 1, 2022

cpython-cla-bot bot commented Apr 30, 2022 •

edited

corona10 left a comment •

edited

be-thomas commented Apr 30, 2022 •

edited