@hackage gambler0.0.0.0

Composable, streaming, and efficient left folds

This package defines the Fold, NonemptyFold, and EffectfulFold types and provides an assortment of ways to construct, combine, and use them.

Every gambler knows that the secret to surviving
Is knowing what to throw away and knowing what to keep

You got to know when to hold 'em, know when to fold 'em
Know when to walk away, and know when to run

The Gambler by Don Schlitz, popularized by Kenny Rogers

Intro to Fold

The foldl' function in the base package is used when we want a strictly evaluated result from traversing a list.

foldl' :: Foldable t => (b -> a -> b) -> b -> t a -> b

For example, to sum a list of numbers:

λ> import qualified Data.List as List

λ> List.foldl' (+) 0 [1..100]
5050

What if we put the first two parameters to List.foldl' into a datatype?

data Fold a b = Fold
    { initial :: b
    , step :: b -> a -> b }

Or, better yet, we can use a trick to turn the datatype into a Functor (which will become important when we discuss the Applicative a bit later):

data Fold a b = forall x. Fold
    { initial :: x
    , step :: x -> a -> x
    , extract :: x -> b }

We can then express the concept of numeric summation as:

sum :: Num a => Fold a a
sum = Fold{ initial = 0, step = (+), extract = id }

This Fold can be used to sum lists and other Foldable collections, but it can also be used to sum effectful streams. So even without any further mechanism, just having this datatype gives us some useful expressive power. There is no need for each streaming library to duplicate all the work of defining its own copies of sum, product, all, any, and, or, minimum, maximum, etc.; a library that provides some kind of Stream type needs only define a function to apply a fold to a stream ...

foldStream :: Fold a b -> Stream m a -> m b

... and then users can make use of any library of folds that they may find or concoct. gambler itself contains much of the functionality of the standard Data.List module, but there are more things in heaven and earth than are dreamt of in this package.

Intro to NonemptyFold

There are some kinds of folding that only work if the input it nonempty. Suppose, for example, we want the greatest of all the items. If there are no items, there is no greatest item. We express this sort of thing with a slight modification to Fold:

data NonemptyFold a b = forall x. NonemptyFold
    { initial :: a -> x
    , step :: x -> a -> x
    , extract :: x -> b }

The only thing that's different is the type of the initial field has changed from x to a -> x; it is now parameterized on the first item.

The notion of selecting greatest item can now be expressed as:

maximum = NonemptyFold{ initial = id, step = max, extract = id }

A NonemptyFold can be converted to a Fold using Fold.Pure.nonemptyFold. The conversion changes the fold's return type from b to Maybe b to accommodate the possibility of empty input.

Intro to EffectfulFold

There is a related function in base that does the same thing as foldl' but in a monadic context:

foldM :: Foldable t => Monad m => (b -> a -> m b) -> b -> t a -> m b

This allows us to perform effects as we fold.

λ> import qualified Control.Monad as Monad

λ> Monad.foldM (\x a -> putStrLn ("* " <> show a) $> (x + a)) 0 [1..5]
* 1
* 2
* 3
* 4
* 5
15

The type we define corresponding to the arguments of Monad.foldM is:

data EffectfulFold m a b = forall x. EffectfulFold
    { initial :: m x
    , step :: x -> a -> m x
    , extract :: x -> m b }

A regular Fold can be converted to an EffectfulFold of any monad using Fold.Effectful.fold.

The Applicative instances

The Fold and EffectfulFold applicatives are great for computing multiple folds over a collection in one pass over the data. For example, suppose that you want to compute both the sum and the length of a list. The following approach works, but it uses space inefficiently:

import qualified Data.List as List

sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength xs = (List.sum xs, List.genericLength xs)

The problem is this goes over the list in two passes. If you demand the result of sum, the Haskell runtime will materialize the entire list. However, the runtime cannot garbage collect the list because the list is still required for the call to length. The space requirement of sumAndLength is therefore linear with respect to the size of the list. We can do much better.

With gambler, we can instead write:

import qualified Fold

sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength = Fold.runFold $ (,) <$> Fold.sum <*> Fold.length

This achieves the same result using constant space.

Quick start

To get quickly playing around with gambler, launch GHCi using cabal:

cabal repl --build-depends gambler

The example from the previous section can be run as follows:

λ> import qualified Fold
λ> Fold.runFold ((,) <$> Fold.sum <*> Fold.length) [1..1000000]
(500000500000,1000000)

This gambler package is mostly a copy of foldl, with some features removed to minimize its dependency set. What remains in gambler is essentially the same as what can be found in foldl version 1.4.13, subject only to reorganization, renaming, and minor modifications.

Future plans

Once the Foldable1 class has been added to base, the type of Fold.Nonempty.run may be generalized to accommodate it.