New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-96421: Insert shim frame on entry to interpreter #96319
base: main
Are you sure you want to change the base?
Conversation
…nt safety checks but ASAN gets tripped up in some cases.
|
Does this add much extra CPU/stack/memory overhead for calling Python from C? Or alternating py/c/py/c/py calls? Probably not the most important use-cases, but I would hope that they wouldn't pessimize too much. |
|
|
The performance impact of this negative but I want to merge anyway because:
|
Thanks for taking the time to do this. There's definitely a lot of subtlety involved.
In general, I think that this adds a fair amount of complexity that wasn't present before (I've recently gained new appreciation for the value of being able to reason about the web of frames that exist in 3.11+, particularly things like visibility and ownership rules). If you're confident that we can recoup the perf cost and complexity with work that builds upon this, though, I think it makes sense as a first step towards that.
Also, I think that the growing set of rules around things like "ownership", "completeness", and "state" are becoming a real maintenance burden. We might consider tightening up those concepts in 3.12, because there are lots of complex relationships and invalid states that are crucial to understand when working on (or with) interpreter internals. I'm thinking not only about us, but also third-party libraries like Greenlet and forks like Cinder.
Misc/NEWS.d/next/Core and Builtins/2022-09-09-10-37-34.gh-issue-96421.cyg33z.rst
Outdated
Show resolved
Hide resolved
Misc/NEWS.d/next/Core and Builtins/2022-10-19-15-59-08.gh-issue-96421.e22y3r.rst
Outdated
Show resolved
Hide resolved
Misc/NEWS.d/next/Core and Builtins/2022-10-19-15-59-08.gh-issue-96421.e22y3r.rst
Outdated
Show resolved
Hide resolved
| static const char INTERPRETER_TRAMPOLINE_CODE[] = { | ||
| 0, 0, | ||
| INTERPRETER_EXIT, 0, | ||
| RESUME, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that this frame is incomplete. I'll add comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's needed, though. co->_co_firsttraceable == Py_SIZE(co) if no RESUME is present, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes, but that's really an implementation detail. The rule is that a frame is invalid until RESUME or RETURN_GENERATOR has been reached.
|
When you're done making the requested changes, leave the comment: |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @brandtbucher: please review the changes made to this pull request. |
|
I don't know if I'll have time to review again today (still need to prepare my talk for the release stream). Is this able to wait until next week? If not, I can try to make time. |
|
It can wait until next week |
Mostly nits on the changes.
Also, I'd like to discuss this vs. something like #98795 again at our stand-up, if that's okay. This still feels pretty heavy/breaky to me compared to that PR (and I'm still not clear what it actually gives us over that approach).
| related C-API function, a shim frame in inserted into the call stack. | ||
| This occurs in the ``_PyEval_EvalFrameDefault()`` function. | ||
| The extra frame should be invisible to all Python and most C extensions, | ||
| but out-of-process debuggers, profilers and debuggers need to be aware of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| but out-of-process debuggers, profilers and debuggers need to be aware of | |
| but out-of-process profilers and debuggers need to be aware of |
| if (co_code == NULL) { | ||
| goto cleanup; | ||
| } | ||
| int code_units = codedef->codelen/2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int code_units = codedef->codelen/2; | |
| int code_units = codedef->codelen / sizeof(_Py_CODEUNIT) |
| goto cleanup; | ||
| } | ||
| int code_units = codedef->codelen/2; | ||
| int loc_entries = (code_units + 7)/8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment like "The maximum number of code units encoded by a single line table entry is 8. This is equivalent to ceil(code_units / 8)."
| loc_table = PyMem_Malloc(loc_entries); | ||
| if (loc_table == NULL) { | ||
| PyErr_NoMemory(); | ||
| return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return NULL; | |
| goto cleanup; |
| if (loc_table) { | ||
| PyMem_Free(loc_table); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done above without the conditional.
| if (loc_table) { | |
| PyMem_Free(loc_table); | |
| } |
| assert(code_units > 0 && code_units <= 8); | ||
| loc_table[loc_entries-1] = 0x80 | | ||
| (PY_CODE_LOCATION_INFO_NONE << 3) | (code_units-1); | ||
| lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries); | |
| lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries); | |
| PyMem_Free(loc_table); |
| goto cleanup; | ||
| } | ||
| struct _PyCodeConstructor con = { | ||
| .filename = &_Py_STR(empty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge deal, but I think we typically put something here for ease of debugging. Maybe "<internal>"?
| @skip_if_sanitizer(memory=True, address=True, | ||
| reason= "Spurious error when assigning to stack variable.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...
| @@ -1326,14 +1326,18 @@ PyFrame_LocalsToFast(PyFrameObject *f, int clear) | |||
| } | |||
| } | |||
|
|
|||
|
|
|||
| int _PyFrame_IsEntryFrame(PyFrameObject *frame) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int _PyFrame_IsEntryFrame(PyFrameObject *frame) | |
| int | |
| _PyFrame_IsEntryFrame(PyFrameObject *frame) |
| if (f->previous == NULL) { | ||
| return 0; | ||
| } | ||
| return f->previous->owner == FRAME_OWNED_BY_CSTACK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (f->previous == NULL) { | |
| return 0; | |
| } | |
| return f->previous->owner == FRAME_OWNED_BY_CSTACK; | |
| return f->previous != NULL && f->previous->owner == FRAME_OWNED_BY_CSTACK; |
| /* Don't keep the reference to previous any longer than necessary. It | ||
| * may keep a chain of frames alive or it could create a reference | ||
| * cycle. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment.
Python/pylifecycle.c
Outdated
| static const char INTERPRETER_TRAMPOLINE_CODE[] = { | ||
| 0, 0, | ||
| INTERPRETER_EXIT, 0, | ||
| RESUME, 0 | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the code separate from the MakeTrampoline function, whereas other details are hardcoded there? Why not hardcode this there as well?
| @@ -1852,27 +1862,33 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int | |||
| goto error; | |||
| } | |||
|
|
|||
| TARGET(RETURN_VALUE) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder that you'll get a merge conflict here -- instruction changes now need to be in bytecodes.c and you need to run make regen-cases.
Simplifies
RETURN_VALUE,RETURN_GENERATORandYIELD_VALUEas they no longer need to check if the current frame is the entry frame.Should allow specialization of
FOR_ITERandSENDfor generators and coroutines.Needs docs and news.
Performance impact is about zero