Friday, October 14, 2016

performance - How many threads should I have, and for what?


Should I have separate threads for rendering and logic, or even more?


I'm aware of the immense performance drop caused by data synchronization (let alone any mutex locks).


I've been thinking of taking this to the extreme and doing threads for conceivable every conceivable subsystem. But I'm worried that may slow things too. (For example, is it sane to separate the input thread from rendering or game logic threads?) Would the data synchronization required make it pointless or even slower?




Answer



The common approach for taking advantage of multiple cores is, frankly, just plain misguided. Separating your subsystems into different threads will indeed split up some of the work across multiple cores, but it has some major problems. First, it's very hard to work with. Who wants to muck around with locks and synchronization and communication and stuff when they could just be writing straight up rendering or physics code instead? Second, the approach doesn't actually scale up. At best, this will allow you to take advantage of maybe three or four cores, and that's if you really know what you're doing. There are only so many subsystems in a game, and of those there are even fewer that take up large chunks of CPU time. There are a couple good alternatives that I know.


One is to have a main thread along with a worker thread for each additional CPU. Regardless of subsystem, the main thread delegates isolated tasks to the worker threads via some sort of queue(s); these tasks may themselves create yet other tasks, as well. The sole purpose of the worker threads is to each grab tasks from the queue one at a time and perform them. The most important thing, though, is that as soon as a thread needs the result of a task, if the task is completed it can get the result, and if not it can safely remove the task from the queue and go ahead and perform that task itself. That is, not all tasks will end up being scheduled in parallel with each other. Having more tasks than can be executed in parallel is a good thing in this case; it means that it is likely to scale as you add more cores. One downside to this is that it requires a lot of work up front to design a decent queue and worker loop unless you have access to a library or language runtime that already provides this for you. The hardest part is making sure your tasks are truly isolated and thread safe, and making sure your tasks are in a happy middle ground between coarse-grained and fine-grained.


Another alternative to subsystem threads is to parallelize each subsystem in isolation. That is, instead of running rendering and physics in their own threads, write the physics subsystem to use all your cores at once, write the rendering subsystem to use all your cores at once, then have the two systems simply run sequentially (or interleaved, depending on other aspects of your game architecture). For example, in the physics subsystem you could take all the point masses in the game, divide them up among your cores, and then have all the cores update them at once. Each core can then work on your data in tight loops with good locality. This lock-step style of parallelism is similar to what a GPU does. The hardest part here is in making sure that you are dividing your work up into fine-grained chunks such that dividing it evenly actually results in an equal amount of work across all processors.


However, sometimes it's just easiest, due to politics, existing code, or other frustrating circumstances, to give each subsystem a thread. In that case, it's best to avoid making more OS threads than cores for CPU heavy workloads (if you have a runtime with lightweight threads that just happen to balance across your cores, this isn't as big of a deal). Also, avoid excessive communication. One nice trick is to try pipelining; each major subsystem can be working on a different game state at a time. Pipelining reduces the amount of communication necessary among your subsystems since they don't all need access to the same data at the same time, and it also can nullify some of the damage caused by bottlenecks. For example, if your physics subsystem tends to take a long time to complete and your rendering subsystem ends up always waiting for it, your absolute frame rate could be higher if you run the physics subsystem for the next frame while the rendering subsystem is still working on the previous frame. In fact, if you have such bottlenecks and can't remove them any other way, pipelining may be the most legitimate reason to bother with subsystem threads.


No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...