When I process a bunch of component data, I'd like that data to be sorted in such a way that I can iterate through the component buffers linearly. For instance when I want to update the physics system, in an ideal world the buffers for the TransformComponent, MovementComponent and PhysicsBodyComponent are structured in such a way that I would be able to iterate over the entities like this:
for(int i = 0; i < m_numEntitiesInSystem; i++)
{
Transform& transform = g_transformBuffer[i];
MovementComponent& movement = g_movementBuffer[i];
PhysicsBodyComponent& body = g_physicsBodyBuffer[i];
// ... update physics ...
}
The main advantage of this is that I would have next to no cache misses for this system. Because different systems require different (combinations of) components, this would imply that before every system update, I would either have to sort the respective component buffers so that they are aligned or I would have to use a cache that is local to the system only (this cache would need to be synced every time a component changes).
Of course both approaches come at a cost as well, sorting the component buffers could affect the ability to execute systems on separate threads. Not to mention that the sorting process itself might have its own performance drawbacks. Syncing the component data implies that some components might need to be copied over possibly several times per frame which again might come with its own performance drawbacks.
My question is: is the price of sorting or syncing worth being able to process data linearly? Or is it wiser to try organise the data in such a way that the most executed/expensive systems perform optimally and others potentially less so. If the former, would it be preferable to use the sorting method or the syncing method?
Also please don't tell me that "premature optimization is evil". Since this is more or less the foundation of my engine, I want it to be as well written and designed as I can. Or at least I want to understand the consequences of picking one method over the other.
Thank you so much!
Answer
It's difficult to say which way is best, since we don't know how prone your setup is to cache misses without this.
I think I remember Jonathan Blow describing doing this type of sort-before-update on one of his developer streams, as a way to improve performance late in a project, so there certainly could be cases where it's beneficial. Without profiling data though, it's not clear how much net savings you'd get, and whether it's worth the cost in complexity or threading flexibility as you mentioned.
One thing to consider is that just iterating in a sorted order, even with gaps, can still be a lot better than a random zig-zag backwards and forwards through memory!
So you might not need to re-sort for every system to get the lion's share of the cache miss savings. A single shared sort order that tends to keep clusters of entities close together for one or a few of your highest-throughput systems might be enough to keep your game running smoothly, even if most / all systems still encounter some gaps in their iteration.
But again, we can't really quantify this to test it without knowing the typical distributions of your data at runtime. There are use patterns under which this will perform at efficiency approaching the theoretical maximum, and other use patterns under which it will suffer as badly as pointer-chasing. So there's really no substitute for building tests and profiling, even if it means you go down the wrong road at first and need to refactor once you know in detail what your runtime data characteristics are like.
No comments:
Post a Comment