@hackage inline-asm0.4.0.2

Inline some Assembly in ur Haskell!

inline-asm

Build Status

When inline C is too safe.

Did you try inline-c, but it's not enough? You need more? Nothing seems to satisfy? inline-asm to the rescue!

And, since the inline assembly is just a usual Haskell value (even if manipulated at compile-time), there's a lot of pretty cool stuff one can do, like, for instance, explicit compile-time loop unrolling.

Examples

Swapping two Ints and also incrementing one of them by two:

defineAsmFun "swap2p1"
  [asmTy| (a : Int) (b : Int) | (_ : Int) (_ : Int)]
  [asm|
  xchg {a}, {b}
  add $2, {b}
  |]

(note the {a}, {b} antiquoters)

Getting the last character of a ByteString, or a default character if it's empty:

defineAsmFun "lastChar"
  [asmTy| (bs : ByteString) (def : Word) | (w : Word) |]
  [asm|
  test {bs:len}, {bs:len}
  jz is_zero
  movzbq -1({bs:ptr},{bs:len}), {w}
  RET_HASK
is_zero:
  mov {def}, {w}
  |]

(note the special {bs:ptr} and {bs:len} antiquoters, as well as RET_HASK command to return early)

SIMD-accelerated character occurrences count in a string:

defineAsmFun "countCharSSE42"
  [asmTy| (ch : Word8) (ptr : Ptr Word8) (len : Int) | (cnt : Int) |] $
  unroll "i" [12..15]
  [asm|
  push %r{i}|] <> [asm|
  vmovd {ch}, %xmm15
  vpxor %xmm0, %xmm0, %xmm0
  vpshufb %xmm0, %xmm15, %xmm15

  shr $7, {len}

  mov $16, %eax
  mov $16, %edx

  xor {cnt}, {cnt}

  {move ptr rdi}
loop: |] <> unrolls "i" [1..8] [
  [asm|
  vmovdqa {(i - 1) * 0x10}({ptr}), %xmm{i}
  |], [asm|
  vpcmpestrm $10, %xmm15, %xmm{i}
  vmovdqa %xmm0, %xmm{i}
  |], [asm|
  vmovq %xmm{i}, %r{i + 7}
  |], [asm|
  popcnt %r{i + 7}, %r{i + 7}
  |], [asm|
  add %r{i + 7}, {cnt}
  |]
  ] <>
  [asm|
  add $128, {ptr}
  dec {len}
  jnz loop|] <> unroll "i" [15,14..12] [asm|
  pop %r{i} |]

(note the unroll/unrolls Haskell function for compile-time code generation and loop unrolling with arithmetic expressions in the templates)

Basic usage

The entry point is the defineAsmFun function from Language.Asm.Inline as well as the asm and asmTy quasiquoters from Language.Asm.Inline.QQ.

First, enable some extensions required for Template Haskell and for the unlifted FFI marshalling stuff:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# LANGUAGE GHCForeignImportPrim, UnliftedFFITypes, UnboxedTuples #-}

then one can just do

defineAsmFun "funName"
  [asmTy| (someInt : Int) (somePtr : Ptr Word) (someStr : BS.ByteString) | (len : Int) (count : Int) |]
  [asm|
  ; your asm code follows
  |]

to define a function funName of the type Int -> Ptr Word -> BS.ByteString -> (Int, Int).

Antiquotation

Good news: it's not necessary to memorize the GHC calling convention to access the arguments and the output values slots! Instead, one can use the {someInt} syntax to refer to the argument or the return slot named someInt.

NB: despite that, be careful to not accidentally overwrite an input parameter for now by picking a wrong register for temporary computations. We might introduce some syntax to pick unused registers in a future version, but for now care must be taken.

ByteString parameters are supported, but, being composite objects, they are a bit special: an input parameter of type ByteString actually takes two registers: one for the address of the string, and one for its length.

Sometimes it might be handy to reassociate an input parameter with another register. For this, the {move argName newReg} antiquoter can be used (for instance, {move someInt rdi}). This will both update the mapping from argument names to register names as well as issue an assembly mov command.

In case you need to return early to Haskell-land, just write RET_HASK, which gets substituted by the actual command to return to Haskell.

Explicit loop unrolling

The asm quasiquoter basically produces a string, so it can be manipulated at compile-time with arbitrary Haskell functions. In particular, a string template can be replicated at compile time.

The unroll function unrolls a single asm code block, calculating arithmetic expressions involving the unroll variable, so

unroll "i" [1..8] [asm|vmovdqa {(i - 1) * 0x10}({ptr}), %xmm{i}|]

is equivalent to

vmovdqa 0x0({ptr}), %xmm1
vmovdqa 0x10({ptr}), %xmm2
vmovdqa 0x20({ptr}), %xmm3
vmovdqa 0x30({ptr}), %xmm4
vmovdqa 0x40({ptr}), %xmm5
vmovdqa 0x50({ptr}), %xmm6
vmovdqa 0x60({ptr}), %xmm7
vmovdqa 0x70({ptr}), %xmm8

unrolls works analogously, but it takes a list of asm code blocks (instead of a single block), unrolls each of them and then concatenates the results. Equationally,

unrolls var ints codes = foldMap (unroll var ints) codes

The countCharSSE42 function above might be a pretty good example.

Safety and notes

  • Firstly, all of this is utterly unsafe.
  • The compiler sees the generated functions as pure, so if a function calls, say, RDRAND and is itself called more than once to get several random numbers, care must be taken to ensure the compiler doesn't elide extra calls. We might introduce some shortcuts to allow wrapping such impure functions in an IO or PrimMonad or soemthing similar.
  • Each function is compiled in its own .S file, so one can freely pick arbitrary naming for the labels and so on, but, on the other hand, one cannot access labels in other functions. This can be remedied somewhat easily — consider throwing up an issue if that's actually desired.
  • Finally, all of this is utterly unsafe.