Compute Shaders

Introduced in OpenGL 4.3

Compute Shaders are different from all other shaders in that they are not triggered by some rendering function like glDrawArrays. They are instead launched by a call to glDispatchCompute or glDispatchComputeIndirect, and they execute completely outside the context of the graphics pipeline we have been studying. They are typically executed between display updates to do intense computation, the results of which will generally be used to update the display on the next display callback. For example, a computationally expensive physics-based simulation may be implemented this way so that the underlying physical process can be viewed as its state evolves.

The functionality is very similar to (and was no doubt modeled after) GPGPU languages like CUDA and OpenCL. Compute Shaders can access and modify the Image Data and Shader Storage Buffers we just studied, and this is the main way they are used in an OpenGL program like that suggested in the previous paragraph.

While the specifics of how to write a Compute Shader are beyond the scope of this extremely brief introduction, a few general observations about how they work can be made. When writing other types of shaders, the work assigned to the shader is implicit. A vertex shader is given the per-vertex attributes for the one vertex it is assigned; a fragment shader writes a color to its assigned pixel. Compute Shaders, on the other hand, have no directly assigned object on which to operate. Instead, a 1D, 2D, or 3D "computational grid" is created, and there will be one shader invocation created and assigned to a GPU core for each position in the grid. Like the other shaders we have studied, hundreds – perhaps thousands – of these will be executing simultaneously. Each such invocation "knows" its 1D, 2D, or 3D index in this computational grid, and that index is used to determine a work assignment.

For example, suppose a Compute Shader is created to multiply two N x N matrices. I could create a 1D computational grid of size N, the i^th shader invocation could be responsible for computing column i of the result matrix. Alternatively I could create a 2D computational grid of size N x N in which shader invocation (i, j) is responsible for computing the element of the result matrix in row i and column j.