Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-96421: Insert shim frame on entry to interpreter #96319

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Aug 26, 2022

Simplifies RETURN_VALUE, RETURN_GENERATOR and YIELD_VALUE as they no longer need to check if the current frame is the entry frame.

Should allow specialization of FOR_ITER and SEND for generators and coroutines.

Needs docs and news.

Performance impact is about zero

@markshannon markshannon changed the title Insert shim frame on entry to interpreter GH-96421: Insert shim frame on entry to interpreter Aug 30, 2022
@sweeneyde
Copy link
Member

sweeneyde commented Sep 8, 2022

Does this add much extra CPU/stack/memory overhead for calling Python from C? Or alternating py/c/py/c/py calls? Probably not the most important use-cases, but I would hope that they wouldn't pessimize too much.

@markshannon
Copy link
Member Author

markshannon commented Sep 9, 2022

Does this add much extra CPU/stack/memory overhead for calling Python from C?

  • CPU: A little, but it speeds up the return sequence. The net effect is about zero
  • C Stack consumption is increased, but only 80 bytes per call.

Python/pylifecycle.c Show resolved Hide resolved
@markshannon markshannon marked this pull request as ready for review Oct 19, 2022
@markshannon
Copy link
Member Author

markshannon commented Oct 20, 2022

The performance impact of this negative but I want to merge anyway because:

Copy link
Member

@brandtbucher brandtbucher left a comment

Thanks for taking the time to do this. There's definitely a lot of subtlety involved.

In general, I think that this adds a fair amount of complexity that wasn't present before (I've recently gained new appreciation for the value of being able to reason about the web of frames that exist in 3.11+, particularly things like visibility and ownership rules). If you're confident that we can recoup the perf cost and complexity with work that builds upon this, though, I think it makes sense as a first step towards that.

Also, I think that the growing set of rules around things like "ownership", "completeness", and "state" are becoming a real maintenance burden. We might consider tightening up those concepts in 3.12, because there are lots of complex relationships and invalid states that are crucial to understand when working on (or with) interpreter internals. I'm thinking not only about us, but also third-party libraries like Greenlet and forks like Cinder.

Lib/opcode.py Show resolved Hide resolved
Objects/frameobject.c Outdated Show resolved Hide resolved
static const char INTERPRETER_TRAMPOLINE_CODE[] = {
0, 0,
INTERPRETER_EXIT, 0,
RESUME, 0
Copy link
Member

@brandtbucher brandtbucher Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Copy link
Member Author

@markshannon markshannon Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that this frame is incomplete. I'll add comments.

Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed, though. co->_co_firsttraceable == Py_SIZE(co) if no RESUME is present, right?

Copy link
Member Author

@markshannon markshannon Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, but that's really an implementation detail. The rule is that a frame is invalid until RESUME or RETURN_GENERATOR has been reached.

Python/traceback.c Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

bedevere-bot commented Oct 20, 2022

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@markshannon
Copy link
Member Author

markshannon commented Oct 21, 2022

I have made the requested changes; please review again

@bedevere-bot
Copy link

bedevere-bot commented Oct 21, 2022

Thanks for making the requested changes!

@brandtbucher: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from brandtbucher Oct 21, 2022
@brandtbucher
Copy link
Member

brandtbucher commented Oct 21, 2022

I don't know if I'll have time to review again today (still need to prepare my talk for the release stream). Is this able to wait until next week? If not, I can try to make time.

@markshannon
Copy link
Member Author

markshannon commented Oct 21, 2022

It can wait until next week

Copy link
Member

@brandtbucher brandtbucher left a comment

Mostly nits on the changes.

Also, I'd like to discuss this vs. something like #98795 again at our stand-up, if that's okay. This still feels pretty heavy/breaky to me compared to that PR (and I'm still not clear what it actually gives us over that approach).

related C-API function, a shim frame in inserted into the call stack.
This occurs in the ``_PyEval_EvalFrameDefault()`` function.
The extra frame should be invisible to all Python and most C extensions,
but out-of-process debuggers, profilers and debuggers need to be aware of
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
but out-of-process debuggers, profilers and debuggers need to be aware of
but out-of-process profilers and debuggers need to be aware of

if (co_code == NULL) {
goto cleanup;
}
int code_units = codedef->codelen/2;
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int code_units = codedef->codelen/2;
int code_units = codedef->codelen / sizeof(_Py_CODEUNIT)

goto cleanup;
}
int code_units = codedef->codelen/2;
int loc_entries = (code_units + 7)/8;
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment like "The maximum number of code units encoded by a single line table entry is 8. This is equivalent to ceil(code_units / 8)."

loc_table = PyMem_Malloc(loc_entries);
if (loc_table == NULL) {
PyErr_NoMemory();
return NULL;
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return NULL;
goto cleanup;

if (loc_table) {
PyMem_Free(loc_table);
}
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done above without the conditional.

Suggested change
if (loc_table) {
PyMem_Free(loc_table);
}

assert(code_units > 0 && code_units <= 8);
loc_table[loc_entries-1] = 0x80 |
(PY_CODE_LOCATION_INFO_NONE << 3) | (code_units-1);
lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries);
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries);
lines = PyBytes_FromStringAndSize((const char *)loc_table, loc_entries);
PyMem_Free(loc_table);

goto cleanup;
}
struct _PyCodeConstructor con = {
.filename = &_Py_STR(empty),
Copy link
Member

@brandtbucher brandtbucher Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal, but I think we typically put something here for ease of debugging. Maybe "<internal>"?

Copy link
Member

@gvanrossum gvanrossum left a comment

I had some earlier comments pending (not sure if they still apply). But mainly this is a reminder that instructions are now in bytecodes.c, and changes there require running make regen-cases.

@skip_if_sanitizer(memory=True, address=True,
reason= "Spurious error when assigning to stack variable.")
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...

@@ -1326,14 +1326,18 @@ PyFrame_LocalsToFast(PyFrameObject *f, int clear)
}
}


int _PyFrame_IsEntryFrame(PyFrameObject *frame)
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int _PyFrame_IsEntryFrame(PyFrameObject *frame)
int
_PyFrame_IsEntryFrame(PyFrameObject *frame)

if (f->previous == NULL) {
return 0;
}
return f->previous->owner == FRAME_OWNED_BY_CSTACK;
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (f->previous == NULL) {
return 0;
}
return f->previous->owner == FRAME_OWNED_BY_CSTACK;
return f->previous != NULL && f->previous->owner == FRAME_OWNED_BY_CSTACK;

/* Don't keep the reference to previous any longer than necessary. It
* may keep a chain of frames alive or it could create a reference
* cycle. */
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment.

static const char INTERPRETER_TRAMPOLINE_CODE[] = {
0, 0,
INTERPRETER_EXIT, 0,
RESUME, 0
};
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the code separate from the MakeTrampoline function, whereas other details are hardcoded there? Why not hardcode this there as well?

Python/pylifecycle.c Show resolved Hide resolved
@@ -1852,27 +1862,33 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
goto error;
}

TARGET(RETURN_VALUE) {
Copy link
Member

@gvanrossum gvanrossum Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder that you'll get a merge conflict here -- instruction changes now need to be in bytecodes.c and you need to run make regen-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants