Tuesday, February 28, 2017

optimization - How to optimize two-pass operations on an array with Unity Coroutines


I am working on simulating vacuum decompression in a 2D top down environment. I have 2 2-dimensional arrays: one that stores the pressure at a location, and one that stores the vector of fluid flow.


The algorithm looks likes this:


private void Flow()
{
Diffuse();
Vectorize();
}


private void Vectorize()
{
for(int x=0;x {
for(int y=0;y {
float myPressure= pressure[x,y];
[...]
Vector2 flowVector=calculate_the_resultant_vector_from_the_pressure_differentials //O(1) op

[...]
vector[x,y]=flowVector;
}
}
}

private void Diffuse()
{
for(int x=0;x {

for(int y=0;y {
float myPressure= pressure[x,y];
[...]
float newPressure=calculate_the_new_pressure_from_the_pressure_differentials //O(1) op
[...]
pressure[x,y]=newPressure;
}
}
}


The problem is that as the area of the simulation space increases, the performance continues to drop off as the iteration steps block the update loop.


I want to convert these to Coroutines in such a way that both yield but more or less maintain a lockstep appearance. Since flow is dependent on pressure differential, and pressure differential is partially dependent on flow, if an order has to be enforced, then Diffuse should take precedence.


My initial attempt followed the following form:


IEnumerable Flow()
{
StartCoroutine(Diffuse());
yield return null;
StartCoroutine(Vectorize());
}


IEnumerable Vectorize()
{
for(int x=0;x {
for(int y=0;y {
float myPressure= pressure[x,y];
[...]
Vector2 flowVector=calculate_the_resultant_vector_from_the_pressure_differentials //O(1) op

[...]
vector[x,y]=flowVector;
yield return null;
}
}
}

IEnumerable Diffuse()
{
for(int x=0;x
{
for(int y=0;y {
float myPressure= pressure[x,y];
[...]
float newPressure=calculate_the_new_pressure_from_the_pressure_differentials //O(1) op
[...]
pressure[x,y]=newPressure;
yield return null;
}

}
}

This tended to cause a jittery simulation. Even on a small 20x20 simulation field, there was noticeable difference between the first and second solutions.


Are there any tricks to coroutines that will allow a more lockstep simulation that scales well? Any suggestions on optimization is welcomed as well.



Answer



You seem to be aware of this but I just want to make it clear, the best solution for these kinds of problems is usually to spread calculations over multiple frames. This is especially true when frame perfect data representation is not necessary.




Now to answer your actual problem:


Coroutines are still executed sequentially, so you don't need to worry about lock step as long they both yield after the same number of iterations.



Whenever you call yield return null, you're telling your function to wait until the next frame to resume. This can cause things to slow down, since you're waiting a frame for every iteration in your double for.


When I've tackled similar problems, I usually follow a pattern like so


IEnumerator Control(){
while(flag){
yield return SpinLock(method);
}
}

IEnumerator SpinLock(){
StartCoroutine(method);

StartCoroutine(method2);
while(running1 && running2){yield return null}
}

This control block executes a coroutine, and doesn't launch another until the current completes. The launched runs its two Coroutines and waits for both two complete. They're both operating on the same chunks of data, but since they sleep at the same count, one doesn't get ahead of the other.


As for the other methods, I'd do something like this:


int perFrame; 
IEnumerator doWork(){
int counter = 0;
runningX = true;


for(int x = 0; x < stuff; x++){
for(int y = 0; y < stuff2; y++){
doStuff()

//yields for next frame.
//you can make this a fraction of the total, or a magic number, both have their uses.
if(counter > perFrame) {counter = 0; yield return null;}
counter++;
}

}

runningX = false;
}

This tells the function to do up to perFrame calculations, then rests. Since the function that calls it isn't waiting, both functions should do the same amount of calculations (in sequence), wait for end of frame, then resume on the next frame.


You can choose to work with the mixed dirty/clean data, or you can update your utilized matrix every iteration




If those 2D arrays are just data, you may be able to use actual threads in your Vectorize and Diffuse methods to take advantage of parallelism. The implementation should mostly be the same.


No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...