The tool has a great potential, but I always found it too limited, fiddly, or imprecise when I needed to optimize some code.
It only supports consecutive instructions in the innermost loops. It can't include nor even ignore any setup/teardown cost. This means I can't feed any function as-is (even a tiny one). I need to manually cut out the loop body.
It doesn't support branches at all. I know it's a very hard problem, but that's the problem I have. Quite often I'd like to compare branchless vs branchy versions of an algorithm. I have to manually remove branches that I think are predictable and hope that doesn't alter the analysis.
It's not designed to compare between different versions of code, so I need to manually rescale the metrics to compare them (different versions of the loop can be unrolled different number of times, or process different amount of elements per iteration, etc.).
Overall that's laborious, and doesn't work well when I want to tweak the high-level C or Rust code to get the best-optimizing version.
> This means I can't feed any function as-is (even a tiny one). I need to manually cut out the loop body.
> It doesn't support branches at all. I know it's a very hard problem, but that's the problem I have
Shameless self-plug: https://github.com/securesystemslab/LLVM-MCA-Daemon
Can you provide a bit more context why the MCA-Daemon is preferred? Looks interesting, but I don't fully get it.