`sys.monitoring` is nice. I used it to find inefficiently ordered chains of branches in one of my projects. For example a chain of `if isinstance(foo, ...); elif isinstance(foo, ...); elif isinstance(foo, ...);` can be reordered from most to least popular based on a representative run, to avoid evaluating branch conditions more than necessary. You collect BRANCH_LEFT/BRANCH_RIGHT events, divide the code objects into "basic blocks", build a graph from them with edges weighted by frequency, and identify chains using a simple algorithm [1]. Then report chains where the long jump is taken more often than the short jump. It's like semi-automatic PGO for Python.
I'm wondering, is the overhead a problem for you because it skews profiling results, or does it lead to the overall runtime becoming too long?
So far I thought profiling might add overhead but the results themselves should be unaffected (modulo the usual pitfalls).
Python 3.15 features a very good sampling profiler with excellent reporting: https://docs.python.org/3.15/library/profiling.sampling.html.
This looks great! If I didn't hace dependencies blocking it, I'd genuinely try out the alpha build just for this profiler alone!