gh-101525: Use only safe identical code folding with BOLT#134642
Open
geofft wants to merge 1 commit intopython:mainfrom
Open
gh-101525: Use only safe identical code folding with BOLT#134642geofft wants to merge 1 commit intopython:mainfrom
geofft wants to merge 1 commit intopython:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
"Identical code folding" (ICF) is the feature of an optimizer to find that two functions have the same code and that they can therefore be deduplicated in the binary. While this is usually safe, it can cause observable behavior differences if the program relies on the fact that the two functions have different addresses.
CPython relies on this in (at least) Objects/typeobject.c, which defines two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same implementation, and stores their addresses in the slotdefs array. If these two functions have the same address, update_one_slot() in that file will fill in slots it shouldn't, causing, for instances, classes defined in Python that inherit from some built-in types to misbehave.
As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode, where it looks to see if there are any uses of a function symbol outside function calls (e.g., relocations in data sections) and skips ICF on such functions. The intent is that this avoids observable behavior differences but still saves storage as much as possible.
This version is about two months old at the time of writing. To support older LLVM versions, we have to turn off ICF entirely.
This problem was previously noticed for Windows/MSVC in #53093 (and again in #24098), where the default behavior of PGO is to enable ICF (which they expand to "identical COMDAT folding") and we had to turn it off.