Friday, September 29, 2017

opengl - How can I improve rendering speeds of a Voxel/Minecraft type game?


I'm writing my own clone of Minecraft (also written in Java). It works great right now. With a viewing distance of 40 meters I can easily hit 60 FPS on my MacBook Pro 8,1. (Intel i5 + Intel HD Graphics 3000). But if I put the viewing distance on 70 meters, I only reach 15-25 FPS. In the real Minecraft, I can put the viewing disntance on far (= 256m) without a problem. So my question is what should I do to make my game better?


The optimisations I implemented:




  • Only keep local chunks in memory (depending on the player's viewing distance)

  • Frustum culling (First on the chunks, then on the blocks)

  • Only drawing really visible faces of the blocks

  • Using lists per chunk that contain the visible blocks. Chunks that become visible will add itself to this list. If they become invisible, they are automatically removed from this list. Blocks become (in)visible by building or destroying a neighbour block.

  • Using lists per chunk that contain the updating blocks. Same mechanism as the visible block lists.

  • Use nearly no new statements inside the game loop. (My game runs about 20 seconds until the Garbage Collector is invoked)

  • I'm using OpenGL call lists at the moment. (glNewList(), glEndList(), glCallList()) for each side of a kind of block.


Currently I'm even not using any sort of lighting system. I heard already about VBO's. But I don't know exactly what it is. However, I'll do some research about them. Will they improve performance? Before implementing VBO's, I want to try to use glCallLists() and pass a list of call lists. Instead using thousand times glCallList(). (I want to try this, because I think that the real MineCraft doesn't use VBO's. Correct?)


Are there other tricks to improve performance?



VisualVM profiling showed me this (profiling for only 33 frames, with a viewing distance of 70 meters):


enter image description here


Profiling with 40 meters (246 frames):


enter image description here


Note: I'm synchronising a lot of methods and code blocks, because I'm generating chunks in another thread. I think that acquiring a lock for an object is a performance issue when doing this much in a game loop (of course, I'm talking about the time when there is only the game loop and no new chunks are generated). Is this right?


Edit: After removing some synchronised blocks and some other little improvements. The performance is already much better. Here are my new profiling results with 70 meters:


enter image description here


I think it is pretty clear that selectVisibleBlocks is the issue here.


Thanks in advance!
Martijn



Update: After some extra improvements (like using for loops in stead of for each, buffering variables outside loops, etc...), I now can run viewing distance 60 pretty good.


I think I'm going to implement VBO's as soon as possible.


PS: All source code is available on GitHub:
https://github.com/mcourteaux/CraftMania



Answer



You mention doing frustum culling on individual blocks — try throwing that out. Most rendering chunks should be either entirely visible or entirely invisible.


Minecraft only rebuilds a display list/vertex buffer (I don't know which it uses) when a block is modified in a given chunk, and so do I. If you're modifying the display list whenever the view changes, you're not getting the benefit of display lists.


Also, you appear to be using world-height chunks. Note that Minecraft uses cubical 16×16×16 chunks for its display lists, unlike for load and save. If you do that, there's even less reason to frustum cull individual chunks.


(Note: I have not examined the code of Minecraft. All of this information is either hearsay or my own conclusions from observing Minecraft's rendering as I play.)





More general advice:


Remember that your rendering executes on two processors: CPU and GPU. When your frame rate is insufficient, then one or the other is the limiting resource — your program is either CPU-bound or GPU-bound (assuming it isn't swapping or having scheduling problems).


If your program is running at 100% CPU (and doesn't have an unbounded other task to complete), then your CPU is doing too much work. You should try to simplify its task (e.g. do less culling) in exchange for having the GPU do more. I strongly suspect this is your problem, given your description.


On the other hand, if the GPU is the limit (sadly, there aren't usually convenient 0%-100% load monitors) then you should think about how to send it less data, or require it to fill fewer pixels.


No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...