Thursday, August 24, 2017

opengl - pyOpenGL draw loop - slow with just 1500 items to draw


I've been trying to build myself a simple little game API using pyOpenGL (previously I tried using just Tkinter, but I keep hitting the same wall whatever I do!)


I rewrote everything using my very limited knowledge of OpenGL and came up with the following for my draw loop:


def _draw(self):
vertex_count = 0
items = 0
coords = numpy.array([], numpy.float32)


for item in self.graph:
z = item.z - 1
if z < 0:
z = 0

x1 = item.x * 2.0 / self.width - 1
y1 = (item.y+z*item.height) * 2.0 / self.height - 1

x2 = (item.x+item.width) * 2.0 / self.width - 1

y2 = ((item.y+z*item.height)+item.height) * 2.0 / self.height - 1

coords = numpy.hstack((coords, numpy.array([
# X, Y, Z U, V
x1, y2, 0.0, self.sprite_sheet[item.frame][0], self.sprite_sheet[item.frame][1],
x2, y2, 0.0, self.sprite_sheet[item.frame][2], self.sprite_sheet[item.frame][3],
x1, y1, 0.0, self.sprite_sheet[item.frame][4], self.sprite_sheet[item.frame][5],
x1, y1, 0.0, self.sprite_sheet[item.frame][6], self.sprite_sheet[item.frame][7],
x2, y2, 0.0, self.sprite_sheet[item.frame][8], self.sprite_sheet[item.frame][9],
x2, y1, 0.0, self.sprite_sheet[item.frame][10], self.sprite_sheet[item.frame][11],

], numpy.float32)) )

vertex_count += 6

glBufferData(GL_ARRAY_BUFFER, coords.nbytes, coords, GL_STATIC_DRAW)
vertices = glGetAttribLocation(self.shader_program, 'a_position')
tex_coords = glGetAttribLocation(self.shader_program, 'a_texCoords')

glEnableVertexAttribArray(vertices)
glEnableVertexAttribArray(tex_coords)

glVertexAttribPointer(vertices, 3, GL_FLOAT, GL_FALSE, 20, None)
glVertexAttribPointer(tex_coords, 2, GL_FLOAT, GL_TRUE, 20, ctypes.c_void_p(12))

return vertex_count

... in calling func ...


glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) # Clear all pixels.

count = self.scene._draw()


glDrawArrays(GL_TRIANGLES, 0, count)

glutSwapBuffers()
glutPostRedisplay()

glFlush()

I'm testing it with a window 720 x 512, drawing a simple set of 2d tiles 32x32 each, so ~1500 items in total, but it is sooo slow, each frame is taking about 0.1 seconds. I thought I'd be safe with OpenGL until I was trying to draw >10,000 things!


I have tried a few permutations of building 1 big array and writing it all in one chunk (as above), or creating a big empty buffer then using glBufferSubData() to write in each item as needed (still creating a numpy.array() for each item then writing it to the buffer).


I have a very simple vertex and fragment shader which really just pass everything through:



attribute vec3 a_position;
attribute vec2 a_texCoords;

uniform mat3 u_matrix;

varying vec2 v_texCoords;

void main()
{
gl_Position = vec4(a_position, 1);


v_texCoords = a_texCoords;
}

and


uniform sampler2D u_image;

varying vec2 v_texCoords;

void main()

{
gl_FragColor = texture2D(u_image, v_texCoords);
}

Sorry for the giant walls of code, trying to be thorough.


Have I done something terribly stupid?



Answer



From the code you presented to us, it appears as though your issue is with how you're creating an array every frame in your render loop. In my experience, you should minimize the logic within a render loop to close to nothing. A rendering loop should be a read-only pipeline, in which you're reading and drawing from arrays and assets that have already been previously created. All your assets should be initialized before the rendering loop starts, this naturally includes any arrays, etc.


If performance is a primary concern for you, you should be doing nothing during your rendering cycle—absolutely nothing; even if statements hinder performance in a small way. Not that this should dissuade you, it's just a truth that comes with game development. It should also be noted that most games(depending on complexity involved) don't even need to go all out with Vertex Buffer Objects, and all the new candy that's available. In fact, one game, Minecraft, still uses Display Lists and Immediate mode for most of their rendering. Essentially, the voxels of any given chunk are compiled into a single display list after being heavily optimized by a face-combining algorithm. Each chunk can then be rendered with a single draw call, leaving only Frustum Occlusion testing to be done within the rendering frame. It's super fast, and serves as proof that you don't need to use VBOs to achieve optimal performance.


Just remember, when writing a rendering loop you need to obey the KISS principle. Do as little as you can, and aim for zero array creations/string allocations(which are internally arrays)/Object Creations. It should all be done before hand.



No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...