Assuming you already have the best-choice algorithm, what low-level solutions can you offer for squeezing the last few drops of sweet sweet frame rate out of C++ code?
It goes without saying that these tips only apply to that critical code section that you've already highlighted in your profiler, but they should be low-level non-structural improvements. I've seeded an example.
Answer
Optimise your data layout! (This applies to more languages than just C++)
You can go pretty deep making this specifically tuned for your data, your processor, handling multi-core nicely, etc. But the basic concept is this:
When you are processing things in a tight loop, you want to make the data for each iteration as small as possible, and as close together as possible in memory. That means the ideal is an array or vector of objects (not pointers) that contain only the data necessary for the calculation.
This way, when the CPU fetches the data for the first iteration of your loop, the next several iterations worth of data will get loaded into the cache with it.
Really the CPU is fast and the compiler is good. There's not really much you can do with using fewer and faster instructions. Cache coherence is where it's at (that's a random article I Googled - it contains a good example of getting cache coherency for an algorithm that doesn't simply run through data linearly).
No comments:
Post a Comment