Wednesday, June 21, 2017

opengl - Create YUV texture for GL_TEXTURE_EXTERNAL_OES format


0 down vote favorite


I need to create a yuv texture for GL_TEXTURE_EXTERNAL_OES format.


source : https://github.com/crossle/MediaPlayerSurface/blob/master/src/me/crossle/demo/surfacetexture/VideoSurfaceView.java


I am doing all processing on YUV, so it would save clock cycles, if I can generate a yuv texture as output of texture2D.



To get 'y' value, I need to take dotproduct of each texel with vec3 of (0.3,0.59,0.11) Since, for my purpose I need to take 3x3 pixel block's 'y' value and take a convolution of them, this results in performance impact. So it would save clock cycles, if I can generate a yuv texture as output of texture2D.



Answer



sam hocevar, in a comment above, recommends making use of hardware blinear filtering like this pseudocode:


sample = (vec4)0;

sample += tex2D(sourceTexture, sourceTexelUV + (1/2 texel stride) * (-1, -1));
sample += tex2D(sourceTexture, sourceTexelUV + (1/2 texel stride) * ( 1, -1));
sample += tex2D(sourceTexture, sourceTexelUV + (1/2 texel stride) * (-1, 1));
sample += tex2D(sourceTexture, sourceTexelUV + (1/2 texel stride) * ( 1, 1));


float blurredLuminance = dot(sample, luminanceCoefficients/4f);

(I've also seen this implemented where we compute U and V ± 1/2 texel stride in the vertex shader, and pass those four values to the fragment shader rather than computing this constant offset per-fragment)


How are we getting a 3x3 kernel with only 4 samples? Each sample is exactly at the corner between 4 texels, so with the texture filtering mode set to bilinear we'll get an average of all 4 texels returned to us very efficiently, since sampling & blending of each texel is implemented in the hardware.


Summing all these pre-blended samples, the math works out that we get a final blend that's 1 part from each corner, 2 parts from each cardinal neighbour, and 4 parts from the center texel - exactly the relative weights we want from a 3x3 Gaussian blur.


Since the dot product is linear, we can do it once on the accumulated result, dividing by the number of samples at the same time (this can be combined into one constant).


I agree with sam that this is probably faster than my 2-pass suggestion. I'd forgotten that separating Gaussians into 2 passes is mainly an efficiency for larger filter kernels, and for a 3x3 kernel it would actually result in more texture sampling workload than the single-pass version outlined here.


No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...