c++ - How can I optimize a collision engine where order is significant and collision is conditional based on object group?

If this is your first time on this question, I suggest reading the pre-update part below first, then this part. Here's a synthesis of the problem, though:

Basically, I have a collision detection and resolution engine with a grid spatial partitioning system where order-of-collision and collision groups matter. One body at a time must move, then detect collision, then resolve collisions. If I move all the bodies at once, then generate possible collision pairs, it is obviously faster, but resolution breaks because order-of-collision isn't respected. If I move one body a time, I'm forced to get bodies to check collisions, and it becomes a n^2 problem. Put groups in the mix, and you can imagine why it gets very slow very fast with a lot of bodies.

Update: I've worked really hard on this, but couldn't manage to optimize anything.

I successfully implemented the "painting" described by Will and changed groups into bitsets, but it is a very very minor speedup.

I also discovered a big issue: my engine is order-of-collision dependent.

I tried an implementation of unique collision pair generation, which definitely speed up everything by a lot, but broke the order-of-collision.

Let me explain:

in my original design (not generating pairs), this happens:
1. a single body moves
2. after it has moved, it refreshes its cells and gets the bodies it collides against
3. if it overlaps a body it needs to resolve against, resolve the collision
this means that if a body moves, and hits a wall (or any other body), only the body that has moved will resolve its collision and the other body will be unaffected.

This is the behavior I desire.

I understand it's not common for physics engines, but it has a lot of advantages for retro-style games.

in the usual grid design (generating unique pairs), this happens:
1. all bodies move
2. after all bodies have moved, refresh all cells
3. generate unique collision pairs
4. for each pair, handle collision detection and resolution
in this case a simultaneous move could have resulted in two bodies overlapping, and they will resolve at the same time - this effectively makes the bodies "push one another around", and breaks collision stability with multiple bodies

This behavior is common for physics engines, but it is not acceptable in my case.

I also found another issue, which is major (even if it's not likely to happen in a real-world situation):

consider bodies of group A, B and W

A collides and resolves against W and A

B collides and resolves against W and B

A does nothing against B

B does nothing against A

there can be a situation where a lot of A bodies and B bodies occupy the same cell - in that case, there is a lot of unnecessary iteration between bodies that mustn't react to one another (or only detect collision but not resolve them).

For 100 bodies occupying the same cell, it's 100^100 iterations! This happens because unique pairs aren't being generated - but I can't generate unique pairs, otherwise I would get a behavior I do not desire.

Is there a way to optimize this kind of collision engine?

These are the guidelines that must be respected:

Order of collision is extremely important!
- Bodies must move one at a time, then check for collisions one at a time, and resolve after movement one at a time.

Bodies must have 3 group bitsets
- Groups: groups the body belongs to
- GroupsToCheck: groups the body must detect collision against
- GroupsNoResolve: groups the body must not resolve collision against
- There can be situations where I only want a collision to be detected but not resolved

Pre-update:

Foreword: I'm aware that optimizing this bottleneck is not a necessity - the engine is already very fast. I, however, for fun and educational purposes, would love to find a way to make the engine even faster.

I'm creating a general-purpose C++ 2D collision detection/response engine, with an emphasis on flexibility and speed.

Here's a very basic diagram of its architecture:

Basic engine architecture

Basically, the main class is World, which owns (manages memory) of a ResolverBase*, a SpatialBase* and a vector.

SpatialBase is a pure virtual class which deals with broad-phase collision detection.

ResolverBase is a pure virtual class which deals with collision resolution.

The bodies communicate to the World::SpatialBase* with SpatialInfo objects, owned by the bodies themselves.

There currenly is one spatial class: Grid : SpatialBase, which is a basic fixed 2D grid. It has it's own info class, GridInfo : SpatialInfo.

Here's how its architecture looks:

Engine architecture with grid spatial

The Grid class owns a 2D array of Cell*. The Cell class contains a collection of (not owned) Body*: a vector which contains all the bodies that are in the cell.

GridInfo objects also contain non-owning pointers to the cells the body is in.

As I previously said, the engine is based on groups.

Body::getGroups() returns a std::bitset of all the groups the body is part of.

Body::getGroupsToCheck() returns a std::bitset of all the groups the body has to check collision against.

Bodies can occupy more than a single cell. GridInfo always stores non-owning pointers to the occupied cells.

After a single body moves, collision detection happens. I assume that all bodies are axis-aligned bounding boxes.

How broad-phase collision detection works:

Part 1: spatial info update

For each Body body:

- Top-leftmost occupied cell and bottom-rightmost occupied cells are calculated.
- If they differ from the previous cells, body.gridInfo.cells is cleared, and filled with all the cells the body occupies (2D for loop from the top-leftmost cell to the bottom-rightmost cell).

body is now guaranteed to know what cells it occupies.

Part 2: actual collision checks

For each Body body:

body.gridInfo.handleCollisions is called:

void GridInfo::handleCollisions(float mFrameTime)
{

    static int paint{-1};
    ++paint;

    for(const auto& c : cells)
        for(const auto& b : c->getBodies())
        {
            if(b->paint == paint) continue;
            base.handleCollision(mFrameTime, b);
            b->paint = paint;
        }

}

void Body::handleCollision(float mFrameTime, Body* mBody)
    {
        if(mBody == this || !mustCheck(*mBody) || !shape.isOverlapping(mBody->getShape())) return;

        auto intersection(getMinIntersection(shape, mBody->getShape()));

        onDetection({*mBody, mFrameTime, mBody->getUserData(), intersection});

        mBody->onDetection({*this, mFrameTime, userData, -intersection});

        if(!resolve || mustIgnoreResolution(*mBody)) return;
        bodiesToResolve.push_back(mBody);
    }

Collision is then resolved for every body in bodiesToResolve.

That's it.

So, I've been trying to optimize this broad-phase collision detection for quite a while now. Every time I try something else than the current architecture/setup, something doesn't go as planned or I make assumption about the simulation that later are proven to be false.

My question is: how can I optimize the broad-phase of my collision engine?

Is there some kind of magic C++ optimization that can be applied here?

Can the architecture be redesigned in order to allow for more performance?

Callgrind output for latest version: http://txtup.co/rLJgz

Answer

getBodiesToCheck()

There could be two problems with the getBodiesToCheck() function; first:

if(!contains(bodiesToCheck, b)) bodiesToCheck.push_back(b);

This part is O(n²) isn't it?

Rather than checking to see if the body is already in the list, use painting instead.

loop_count++;
if(!loop_count) { // if loop_count can wrap,
    // you just need to iterate all bodies to reset it here
}
bodiesToCheck.clear();
for(const auto& q : queries)
    for(const auto& b : *q)
        if(b->paint != loop_count) {
            bodiesToCheck.push_back(b);
            b->paint = loop_count;

        }
return bodiesToCheck;

You are dereferencing the pointer in the gather phase, but you'd be dereferencing it in the test phase anyway so if you have enough L1 its no big deal. You can improve performance by adding pre-fetch hints to the compiler too e.g. __builtin_prefetch, although that is easier with classic for(int i=q->length; i-->0; ) loops and such.

That's a simple tweak, but my second thought is that there could be a faster way to organise this:

You can move to using bitmaps instead, though, and avoiding the whole bodiesToCheck vector. Here's an approach:

You are using integer keys for bodies already, but then looking them up in maps and things and keeping around lists of them. You can move to a slot allocator, which is basically just an array or vector. E.g.:

class TBodyImpl {
   public:
       virtual ~TBodyImpl() {}

       virtual void onHit(int other) {}
       virtual ....
       const int slot;
   protected:
      TBodyImpl(int slot): slot(slot_) {}
};

struct TBodyBase {
    enum ... type;
    ...

    rect_t rect;
    TQuadTreeNode *quadTreeNode; // see below
    TBodyImpl* imp; // often null
};

std::vector bodies; // not pointers to them

What this means is that all the stuff needed to do the actual collisions is in linear cache-friendly memory, and you only go out to the implementation-specific bit and attach it to one of these slots if there's some need to.

To track the allocations in this vector of bodies you can use an array of integers as a bitmap and use bit twiddling or __builtin_ffs etc. This is super efficient to move to slots that are currently occupied, or find an unoccupied slot in the array. You can even compact the array sometimes if it grows unreasonably large and then lots are marked deleted, by moving those on the end to fill in the gaps.

only check for each collision once

If you've checked if a collides with b, you don't need to check if b collides with a too.

It follows from using integer ids that you avoid these checks with a simple if-statement. If the id of a potential collision is less-than-or-equal to the current id being checked for, it can be skipped! This way, you'll only ever check each possible pairing once; that'll more than half the number of collision checks.

unsigned * bitmap;
int bitmap_len;
...

for(int i=0; i  unsigned mask = bitmap[i];
  while(mask) {
      const int j = __builtin_ffs(mask);

      const int slot = i*sizeof(unsigned)*8+j;
      for(int neighbour: get_neighbours(slot))
          if(neighbour > slot)
              check_for_collision(slot,neighbour);
      mask >>= j;
  }

respect the order of collisions

Rather than evaluating a collision as soon as a pair is found, compute the distance to hit and store that in a binary heap. These heaps are how you typically do priority queues in path-finding, so is very useful utility code.

Mark each node with a sequence number, so you can say:

A¹⁰ hits B¹² at 6

A¹⁰ hits C¹² at 3

Obviously after you've gathered all the collisions, you start popping them from the priority queue, soonest first. So the first you get is A¹⁰ hits C¹² at 3. You increment each object's sequence number (the ¹⁰ bit), evaluate the collision, and compute their new paths, and store their new collisions in the same queue. The new collision is A¹¹ hits B¹² at 7. The queue now has:

A¹⁰ hits B¹² at 6

A¹¹ hits B¹² at 7

Then you pop from the priority queue and its A¹⁰ hits B¹² at 6. But you see that A¹⁰ is stale; A is currently at 11. So you can discard this collision.

Its important not to bother trying to delete all stale collisions from the tree; removing from a heap is expensive. Simply discard them when you pop them.

the grid

You should consider using a quadtree instead. Its a very straightforward data-structure to implement. Often you see implementations that store points but I prefer to store rects, and store the element in the node that contains it. This means that to check collisions you only have to iterate over all bodies, and, for each, check it against those bodies in the same quad-tree node (using the sorting trick outlined above) and all those in parent quad-tree nodes. The quad-tree is itself the possible-collision list.

Here's a simple Quadtree:

struct Object {
    Rect bounds;
    Point pos;
    Object * prev, * next;
    QuadTreeNode * parent;
};


struct QuadTreeNode {
    Rect bounds;
    Point centre;
    Object * staticObjects;
    Object * movableObjects;
    QuadTreeNode * parent; // null if root
    QuadTreeNode * children[4]; // null for unallocated children
};

We store the movable objects separately because we don't have to check if the static objects are going to collide with anything.

We are modeling all objects as axis-aligned bounding boxes (AABB) and we put them in the smallest QuadTreeNode that contains them. When a QuadTreeNode a lot of children, you can subdivide it further (if those objects distribute themselves into the children nicely).

Each game tick, you need to recurse into the quadtree and compute the move - and collisions - of each movable object. It has to be checked for collisions with:

every static object in its node

every movable object in its node that is before it (or after it; just pick a direction) in the movableObjects list

every movable and static object in all parent nodes

This will generate all possible collisions, unordered. You then do the moves. You have to prioritise these moves by distance and 'who moves first' (which is your special requirement), and execute them in that order. Use a heap for this.

You can optimise this quadtree template; you don't need to actually store the bounds and centre-point; that's entirely derivable when you walk the tree. You don't need to check if a model is within the bounds, only check which side it is of the centre-point (a "axis of separation" test).

To model fast flying things like projectiles, rather than moving them each step or having a separate 'bullets' list that you always check, simply put them in the quadtree with the rect of their flight for some number of game steps. This means that they move in the quadtree much more rarely, but you aren't checking bullets against far off walls, so its a good tradeoff.

Large static objects should be split into component parts; a large cube should have each face separately stored, for example.

Blog

Sunday, August 27, 2017

c++ - How can I optimize a collision engine where order is significant and collision is conditional based on object group?

No comments:

Post a Comment

Simple past, Present perfect Past perfect