New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Performance Improvements #92084
base: main
Are you sure you want to change the base?
Conversation
|
The following commit authors need to sign the Contributor License Agreement: |
|
Every change to Python requires a NEWS entry. Please, add it using the blurb_it Web app or the blurb command-line tool. |
|
I'm porting the html.parser along with the ParserBase class to Lua. |
Please create an issue first and provide the proper benchmark for the optimization.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Lib/_markupbase.py
Outdated
| else: | ||
| return -1 | ||
| while rawdata[j:j+1].isspace(): | ||
| while rawdata[j].isspace(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If j becomes len(rawdata), this will now fail with IndexError. Previously the loop just stopped and it went into the if statement below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense.
It would require two checks to make this behave properly (j < len(rawdata) and rawdata[j].isspace()).
performance improvement for this is not worth the complexity
|
Every change to Python requires a NEWS entry. Please, add it using the blurb_it Web app or the blurb command-line tool. |
|
There is a lot of wasteful CPU code in this fashion (seen multiple times) :- search performed in string twice, even if we could just do it once. I'm new to Open Source Code Contributions. Would love to learn from other's coding style & know other's point of view. |
You should create an issue to discuss your changes, and possibly several PRs linked to the issue for the optimizations you find. The optimizations should also be confirmed by benchmarks whenever possible. I should have around some code I used to test/benchmark that I might publish if you think it might be useful. While you are at it, it would also be useful to improve testing (this will be especially useful for you if you are validating your parser against the HTMLParser test suite) and possibly helpreview/fix related HTMLParser issues. |
|
I have created an issue #92088 |
No description provided.