@hackage gl-block1.0

OpenGL standard memory layouts

gl-block

Using

Primitive types should have the Block instance. With that in place, you can build structures and get Storable instances derived generically, according to intented usage.

import GHC.Generics (Generic)
import Graphics.Gl.Block (Block, Packed(..), Std140(..), Std430(..))

-- | Attribute streams can be packed tightly together
data VertexAttrs = VertexAttrs
  { color :: Vec4
  , texCoords :: Vec2
  }
  deriving stock (Eq, Ord, Show, Generic) -- The regular stuff and Generic
  deriving anyclass Block -- The layout class, with Generic defaults
  deriving Storable via (Packed VertexAttrs) -- Free goodies!

-- | Uniform data require jumping through padding and alignment flaming hoop.
-- You can use derive-storable-plugin or hs2c instead, but there are gotchas.
data SceneUniform = SceneUniform
  { projection :: Mat4
  , viewPosition :: Vec3 -- Here comes the jazz
  , viewDirection :: Vec3
  }
  deriving stock (Eq, Ord, Show, Generic)
  deriving anyclass Block
  deriving Storable via (Std140 VertexAttrs) -- With comfy padding

-- | Shader buffer objects are less vacuous, but the rules are specific to the domain.
data Material = Material
  { baseColor :: Vec4
  , metallicRoughness :: Vec2
  , emission :: Vec4
  }
  deriving stock (Eq, Ord, Show, Generic)
  deriving anyclass Block
  deriving Storable via (Std430 VertexAttrs) -- Less alignment, less calculations

Benchmarks

The benchmark consists of filling a Storable vector.

  • Packed layout is on par with manual instances.
  • Std140 is slower, but not catastrophically so.
  • Std430 seems to regain some performance due to being a tad simpler.

There's only one "manual" case standing for all the layouts since it would only be different in pointer offsets. And no way in hell I'm going to calculate them by hand!

  struct
    10
      manual: OK (2.26s)
        60.6 ns ± 5.4 ns
      packed: OK (1.21s)
        63.7 ns ± 5.8 ns
      std140: OK (1.19s)
        256  ns ±  26 ns
      std430: OK (0.13s)
        269  ns ±  24 ns
    1000
      manual: OK (1.95s)
        3.34 μs ± 244 ns
      packed: OK (1.96s)
        3.34 μs ± 235 ns
      std140: OK (1.97s)
        6.70 μs ± 425 ns
      std430: OK (1.44s)
        4.89 μs ±  93 ns
    1000000
      manual: OK (2.59s)
        2.22 ms ± 176 μs
      packed: OK (1.50s)
        2.47 ms ±  24 μs
      std140: OK (2.70s)
        4.54 ms ± 431 μs
      std430: OK (1.08s)
        3.28 ms ± 244 μs

Caveat: nested structures have degraded performance.