So I'm doing some DirectX development, using SharpDX under .NET to be exact (but DirectX/C++ API solutions are applicable). I'm looking for the fastest way to render lines in an orthogonal projection (e.g. simulating 2D line drawing for scientific apps) using DirectX.
A screenshot of the sorts of plots I'm trying to render follows:
It's not uncommon for these sorts of plots to have lines with millions of segments, of variable thickness, with or without antialiasing per-line (or full screen AA on/off). I need to update the vertices for the lines very frequently (e.g. 20 times/second) and offload as much to the GPU as possible.
So far I have tried:
- Software rendering, e.g. GDI+ actually not bad performance but obviously is heavy on the CPU
- Direct2D API - slower than GDI, especially with Antialiasing on
- Direct3D10 using this method to emulate AA using vertex colours and tessellation on the CPU side. Also slow (I profiled it and 80% of time is spent computing vertex positions)
For the 3rd method I'm using Vertex Buffers to send a triangle strip to the GPU and updating every 200ms with new vertices. I'm getting a refresh rate of around 5FPS for 100,000 line segments. I need millions ideally!
Now I'm thinking that the fastest way would be to do the tessellation on the GPU, e.g. in a Geometry Shader. I could send the vertices as a line-list or pack in a texture and unpack in a Geometry Shader to create the quads. Or, just send raw points to a pixel shader and implement Bresenham Line drawing entirely in a pixel shader. My HLSL is rusty, shader model 2 from 2006 so I don't know about the crazy stuff modern GPUs can do.
So the question is: - has anyone done this before, and do you have any suggestions to try? - Do you have any suggestions to improve performance with rapidly updating geometry (e.g. new vertex list every 20ms)?
UPDATE 21st Jan
I have since implemented method (3) above using Geometry shaders using LineStrip and Dynamic Vertex Buffers. Now I'm getting 100FPS at 100k points and 10FPS at 1,000,000 points. This is a huge improvement but now I'm fill-rate and compute limited, so I got thinking about other techniques/ideas.
- What about Hardware Instancing of a Line Segment geometry?
- What about Sprite Batch?
- What about other (Pixel shader) oriented methods?
- Can I efficiently cull on the GPU or CPU?
Your comments & suggestions much appreciated!
Answer
If you are going to render Y = f(X)
graphs only, then I suggest trying the following method.
The curve data is passed as texture data, making it persistent, and allowing for partial updates through glTexSubImage2D
for instance. If you need scrolling you could even implement a circular buffer and only update a few values per frame. Each curve is rendered as a fullscreen quad and all the work is done by the pixel shader.
The one-component texture contents could look like this:
+----+----+----+----+
| 12 | 10 | 5 | .. | values for curve #1
+----+----+----+----+
| 55 | 83 | 87 | .. | values for curve #2
+----+----+----+----+
The work of the pixel shader is as follows:
- find the X coordinate of the current fragment in the dataset space
- take eg. the 4 closest data points that have data; for instance if the X value is
41.3
it would choose40
,41
,42
and43
. - query the texture for the 4 Y values (make sure the sampler does no interpolation of any kind)
- convert the
X,Y
pairs to screen space - compute the distance from current fragment to each of the three segments and four points
- use the distance as an alpha value for the current fragment
You may wish to substitute 4 with larger values depending on the potential zoom level.
I have written a very quick and dirty GLSL shader implementing this feature. I may add the HLSL version later, but you should be able to convert it without too much effort. The result can be seen below, with different line sizes and data densities:
One clear advantage is that the amount of data transferred is very low, and the number of drawcalls is only one.
No comments:
Post a Comment