I'm aware that it usually is a BAD idea to operate separately on GLSL vec's components separately. For example:
//use instrinsic functions, they do the calculation on 4 components at a time.
float dot = v1.x*v2.x + v1.y * v2.y + v1.z * v2.z; //WRONG
float dot = dot(v1, v2); //RIGHT
//Multiply one by one is bad too, since the ALU can do the 4 components at a time.
vec3 mul = vec3(v1.x * v2.x, v1.y * v2.y, v1.z * v2.z); //WRONG
vec3 mul = v1 * v2; //RIGHT
I've been struggling thinking, are there equivalent operations for branching?
For example:
vec4 Overlay(vec4 v1, vec4 v2, vec4 opacity)
{
bvec4 less = lessThan(v1, vec4(0.5));
vec4 blend;
for(int i = 0; i < 4; ++i)
{
if(less[i]) {
blend[i] = 2.0 * v1[i] * v2[i];
} else {
blend[i] = 1.0 - 2.0 * (1.0 - v1[i]) * (1.0 - v2[i]);
}
}
return v1 + (blend - v1) * opacity;
}
This is a Overlay operator that works component wise. I'm not sure if this is the best way to do it, since I'm afraid these for
and if
can be a bottleneck later.
Tl;dr, Can I branch component wise? If yes, how can I optimize that Overlay function with it?
Answer
You can optimize this to be branchless actually, however for seeing if it's faster you always want to profile it:
vec4 Overlay(vec4 v1, vec4 v2, vec4 opacity)
{
bvec4 less = lessThan(v1, vec4(0.5));
vec4 blendHigh = vec4(1.0) - vec4(2.0) * (vec4(1.0) - v1) * (vec4(1.0) - v2);
vec4 blendLow = vec4(2.0) * v1 * v2;
vec4 blend = mix(blendHigh, blendLow, less);
return v1 + (blend-v1)*opacity;
}
Notice that mix(vec, vec, bvec)
DO NOT work on most GL ES implementations, so just use it if you're working just with desktop. (But you can test it on mobile if you wish so)
EDIT: Here's an optimized version making use of swizzle masks and MADs (note the algorithm is the same, it was just rearranged.)
vec4 Overlay(vec4 v1, vec4 v2, vec4 opacity)
{
const vec3 constants = vec3(1.0, 2.0, 0.5);
bvec4 less = lessThan(v1, constants.zzzz);
vec4 blendLow = v1 * v2 * constants.yyyy;
vec4 blendHigh = constants.yyyy * (v2 + v1) - blendLow - constants.xxxx;
vec4 blend = mix(blendHigh, blendLow, less);
return v1 + (blend-v1) * opacity;
}
No comments:
Post a Comment