In 'normal' business programming optimization step is often left until really needed. Meaning you should not optmize until it is really needed.
Remember what Donald Knuth said "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
When is the time to optimize to make sure I am not wasting effort. Should I do it a method level? Class level? Module level?
Also what should my measurement of optimization? Ticks? Frame Rate? Total Time?
Answer
Where I've worked, we always use multiple levels of profiling; if you see a problem, you just move down the list a bit more until you figure out what's going on:
- The "human profiler", aka just play the game; does it feel slow or "hitch" occasionally? Noticing jerky animations? (As a developer, note that you'll be more sensitive to some kinds of performance issues and oblivious to others. Plan extra testing accordingly.)
- Turn on the FPS display, which is a sliding-window 5 second average FPS. Very little overhead to calculate and display.
- Turn on the profile bars, which are just a series of quads (ROYGBIV colors) that represent different parts of the frame (e.g. vblank, preframe, update, collision, render, postframe) using a simple "stopwatch" timer around each section of code. To emphasize what we want, we set one screen width worth of bar to be representative of a 60Hz target frame, so it's really easy to see if you're e.g. 50% under budget (only a half-bar) or 50% over (the bar wraps and becomes one and a half bars). It's also pretty easy to tell what's generally eating most of the frame: red = render, yellow = update, etc...
- Build a special instrumented build that inserts "stopwatch" like code around each and every function. (Note that you may take a massive performance, dcache, and icache hit when doing this, so it's definitely intrusive. But if you lack a proper sampling profiler or decent support on the CPU, this is an acceptable option. You can also be clever about recording a minimum of data on function enter/exit and rebuilding calltraces later.) When we built ours, we mimicked much of gprof's output format.
- Best of all, run a sampling profiler; VTune and CodeAnalyst are available for x86 and x64, you've got various simulation or emulation environments that might give you data here.
(There's a fun story from a past year's GDC of a graphics programmer who took four pictures of himself -- happy, indifferent, annoyed, and angry -- and displayed an appropriate picture in the corner of the internal builds based on the framerate. The content creators quickly learned not to turn on complicated shaders for all of their objects and environments: they'd make the programmer angry. Behold the power of feedback.)
Note you can also do fun things like graph the "profile bars" continuously, so you can see spike patterns ("we're losing a frame every 7 frames") or the like.
To answer your question directly, though: in my experience, while it's tempting (and often rewarding -- I usually learn something) to rewrite single functions/modules to optimize number of instructions or icache or dcache performance, and we do actually need to do this sometimes when we've got a particularly obnoxious performance problem, the vast majority of the performance issues we deal with on a regular basis come down to design. For example:
- Should we cache in RAM or reload from disk the "attack" state animation frames for the player? How about for each enemy? We don't have RAM to do them all, but disk loads are expensive! You can see the hitching if 5 or 6 different enemies pop in at once! (Okay, how about staggering spawning?)
- Are we doing a single type of operation across all particles, or all operations across a single particle? (This is an icache/dcache tradeoff, and the answer isn't always clear.) How about pulling apart all the particles and storing the positions together (the famous "struct of arrays") vs keeping all particle data in one place ("array of structs").
You hear it until it becomes obnoxious in any university-level computer science courses, but: it really is all about the data structures and algorithms. Spending some time on algorithm and data flow design is going to get you more bang for the buck in general. (Make sure you've read the excellent Pitfalls of Object Oriented Programming slides from a Sony Developer Services fellow for some insight here.) This doesn't "feel" like optimization; it's mostly time spent with a whiteboard or UML tool or creating many prototypes, rather than making current code run faster. But it's generally much more worthwhile.
And another useful heuristic: if you're close to your engine's "core", it may be worth some extra effort and experimentation to optimize (e.g. vectorize those matrix multiplies!). The further from core, the less you should be worrying about that unless one of your profiling tools tells you otherwise.
No comments:
Post a Comment