2d - Fastest way to render lines with AA, varying thickness in DirectX

Thursday, May 18, 2017

2d - Fastest way to render lines with AA, varying thickness in DirectX

So I'm doing some DirectX development, using SharpDX under .NET to be exact (but DirectX/C++ API solutions are applicable). I'm looking for the fastest way to render lines in an orthogonal projection (e.g. simulating 2D line drawing for scientific apps) using DirectX.

A screenshot of the sorts of plots I'm trying to render follows: enter image description here

It's not uncommon for these sorts of plots to have lines with millions of segments, of variable thickness, with or without antialiasing per-line (or full screen AA on/off). I need to update the vertices for the lines very frequently (e.g. 20 times/second) and offload as much to the GPU as possible.

So far I have tried:

Software rendering, e.g. GDI+ actually not bad performance but obviously is heavy on the CPU

Direct2D API - slower than GDI, especially with Antialiasing on

Direct3D10 using this method to emulate AA using vertex colours and tessellation on the CPU side. Also slow (I profiled it and 80% of time is spent computing vertex positions)

For the 3rd method I'm using Vertex Buffers to send a triangle strip to the GPU and updating every 200ms with new vertices. I'm getting a refresh rate of around 5FPS for 100,000 line segments. I need millions ideally!

Now I'm thinking that the fastest way would be to do the tessellation on the GPU, e.g. in a Geometry Shader. I could send the vertices as a line-list or pack in a texture and unpack in a Geometry Shader to create the quads. Or, just send raw points to a pixel shader and implement Bresenham Line drawing entirely in a pixel shader. My HLSL is rusty, shader model 2 from 2006 so I don't know about the crazy stuff modern GPUs can do.

So the question is: - has anyone done this before, and do you have any suggestions to try? - Do you have any suggestions to improve performance with rapidly updating geometry (e.g. new vertex list every 20ms)?

UPDATE 21st Jan

I have since implemented method (3) above using Geometry shaders using LineStrip and Dynamic Vertex Buffers. Now I'm getting 100FPS at 100k points and 10FPS at 1,000,000 points. This is a huge improvement but now I'm fill-rate and compute limited, so I got thinking about other techniques/ideas.

What about Hardware Instancing of a Line Segment geometry?

What about Sprite Batch?

What about other (Pixel shader) oriented methods?

Can I efficiently cull on the GPU or CPU?

Your comments & suggestions much appreciated!

Answer

If you are going to render Y = f(X) graphs only, then I suggest trying the following method.

The curve data is passed as texture data, making it persistent, and allowing for partial updates through glTexSubImage2D for instance. If you need scrolling you could even implement a circular buffer and only update a few values per frame. Each curve is rendered as a fullscreen quad and all the work is done by the pixel shader.

The one-component texture contents could look like this:

+----+----+----+----+
| 12 | 10 |  5 | .. | values for curve #1
+----+----+----+----+
| 55 | 83 | 87 | .. | values for curve #2

+----+----+----+----+

The work of the pixel shader is as follows:

find the X coordinate of the current fragment in the dataset space

take eg. the 4 closest data points that have data; for instance if the X value is 41.3 it would choose 40, 41, 42 and 43.

query the texture for the 4 Y values (make sure the sampler does no interpolation of any kind)

convert the X,Y pairs to screen space

compute the distance from current fragment to each of the three segments and four points

use the distance as an alpha value for the current fragment

You may wish to substitute 4 with larger values depending on the potential zoom level.

I have written a very quick and dirty GLSL shader implementing this feature. I may add the HLSL version later, but you should be able to convert it without too much effort. The result can be seen below, with different line sizes and data densities:

curves

One clear advantage is that the amount of data transferred is very low, and the number of drawcalls is only one.

Blog

Thursday, May 18, 2017

2d - Fastest way to render lines with AA, varying thickness in DirectX

No comments:

Post a Comment

Simple past, Present perfect Past perfect