JIT: Fix fgFoldCondToReturnBlock for multi-statement return blocks#124642
JIT: Fix fgFoldCondToReturnBlock for multi-statement return blocks#124642yykkibbb wants to merge 4 commits intodotnet:mainfrom
Conversation
…otnet#123621) When a constant-folded operand appears after a non-constant operand in a short-circuit && expression, callee inlining may leave dead local stores in the return block. The isReturnBool lambda in fgFoldCondToReturnBlock required hasSingleStmt(), causing the optimization to bail out when these dead stores were present. Relax the single-statement constraint to allow preceding statements as long as they have no globally visible side effects (e.g., dead local stores left over from inlining). Fixes dotnet#123621
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
| @@ -0,0 +1,53 @@ | |||
| // Licensed to the .NET Foundation under one or more agreements. | |||
There was a problem hiding this comment.
it's not a regression to be under JIT/Regression
This is an optimization improvement, not a regression fix, so the test belongs under JIT/opt/OptimizeBools.
c2be40f to
8a2123e
Compare
|
Do you know why the dead stores aren't being removed by dead code elimination? It would probably be better to fix that if possible. |
09d46df to
4f539bb
Compare
The VN-based dead store removal phase only handled value-redundant stores but had no mechanism to remove stores with zero remaining uses. After Redundant branch opts removes JTRUE references, inlining temps' dead stores were left behind, preventing fgFoldCondToReturnBlock from producing branchless codegen. - Add an SSA use recount pass to get accurate counts after earlier phases may have removed references without updating use counts - Change the inner loop to start at defIndex=0 so inlining temps' first explicit def (at m_array[0], with no implicit entry def) is considered for removal - Add a "no remaining uses" dead store removal path that removes the entire statement when data has no side effects, falling back to the COMMA approach otherwise - Revert optimizebools.cpp to the original hasSingleStmt() check, since the dead stores are now properly removed before Optimize bools
4f539bb to
74defde
Compare
|
@jakobbotsch Thanks so much for taking the time to review — I really appreciate the suggestion. This is my first contribution to this repo, and I'm learning as I go, so I know there may be rough edges. Your feedback has been incredibly helpful, and I want to do my best to get this right. The VN-based dead store removal phase only handles value-redundant stores (where the new value matches the previous def). It doesn't detect stores that simply have no remaining uses — which is exactly the case here after Redundant branch opts removes the referencing JTRUEs. I've updated the PR to fix this at the DCE level:
Both dead stores are now properly removed, and |
|
It was enough to say that this is dependent on RBO. I think this change is too invasive – I would go back to what you had before. |
…t#123621) Revert the DCE-level fix as it is too invasive. The original optimizebools.cpp approach (allowing multi-statement return blocks in isReturnBool) is sufficient since the dead stores are dependent on Redundant branch opts.
|
@jakobbotsch Thank you for the feedback — I've reverted the DCE changes and restored the original approach. |
Summary
Fixes #123621
When a constant-folded operand appears after a non-constant operand in a short-circuit
&&expression (e.g.,v == 2 && Environment.NewLine != "\r\n"), callee inlining can leave dead local stores in the return block. TheisReturnBoollambda infgFoldCondToReturnBlockrequiredhasSingleStmt(), which caused the optimization to bail out when these dead stores were present, resulting in suboptimal branching codegen.Changes
src/coreclr/jit/optimizebools.cpp: Relax thehasSingleStmt()constraint inisReturnBoolto allow preceding statements as long as they have no globally visible side effects (GTF_GLOBALLY_VISIBLE_SIDE_EFFECTS). This enablesfgFoldCondToReturnBlockto fold the conditional into a branchless return even when dead local stores from inlining remain in the block.Before (ARM64,
Inline_After)After (ARM64,
Inline_After)Test plan
Runtime_123621covering the original issue patternHoisted,Inline_Before, andInline_Afterall produce identical branchless codegen (cseton ARM64)DevDiv_168744regression test still passes