@hackage streamly-lmdb0.5.0

Stream data to or from LMDB databases using the streamly library.

streamly-lmdb

Hackage CI

Stream data to or from LMDB databases using the Haskell streamly library.

Requirements

Install LMDB on your system:

  • Debian Linux: sudo apt-get install liblmdb-dev.
  • macOS: brew install lmdb.

Quick start

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Streamly.External.LMDB
  ( Limits (mapSize),
    WriteOptions (writeTransactionSize),
    defaultLimits,
    defaultReadOptions,
    defaultWriteOptions,
    getDatabase,
    openEnvironment,
    readLMDB,
    tebibyte,
    writeLMDB,
  )
import qualified Streamly.Prelude as S

main :: IO ()
main = do
  -- Open an environment. There should already exist a file or
  -- directory at the given path. (Empty for a new environment.)
  env <-
    openEnvironment "/path/to/lmdb-database" $
      defaultLimits {mapSize = tebibyte}

  -- Get the main database.
  -- Note: It is common practice with LMDB to create the database
  -- once and reuse it for the remainder of the program’s execution.
  db <- getDatabase env Nothing

  -- Stream key-value pairs into the database.
  let fold' = writeLMDB db defaultWriteOptions {writeTransactionSize = 1}
  let writeStream = S.fromList [("baz", "a"), ("foo", "b"), ("bar", "c")]
  _ <- S.fold fold' writeStream

  -- Stream key-value pairs out of the
  -- database, printing them along the way.
  -- Output:
  --     ("bar","c")
  --     ("baz","a")
  --     ("foo","b")
  let unfold' = readLMDB db Nothing defaultReadOptions
  let readStream = S.unfold unfold' undefined
  S.mapM_ print readStream

Benchmarks

See bench/README.md. Summary (with rough figures from our machine):

  • Reading. For reading a fully cached LMDB database, this library (when unsafeReadLMDB is used instead of readLMDB) has roughly a 15 ns/pair overhead compared to plain Haskell IO code, which has roughly another 10 ns/pair overhead compared to C. (The first two being similar fulfills the promise of streamly and stream fusion.) We deduce that if your total workload per pair takes longer than around 25 ns, your bottleneck will not be your usage of this library as opposed to C.
  • Writing. Writing with plain Haskell IO code and with this library is, respectively, around 30% and 50% slower than writing with C. We have not dug further into these differences because this write performance is currently good enough for our purposes.

Linode; Debian 10, Dedicated 32GB: 16 CPU, 640GB Storage, 32GB RAM.