@hackage hsprocess0.3

The Haskell Stream Processor command line utility

Haskell Stream Processor

Haskell Stream Processor is a command line utility to process streams using Haskell code.

There are many reasons why Haskell is suitable for stream processing from the command line. Code written in Haskell is concise thanks to a clean syntax and the type inference which allows code without type decoration. Also it is very easy to define one-line transformations by combining functions.

For example:

hsp "L.map (L.head . words) . lines"

prints the first word of each line of the input stream.

Installation

From the project directory

cabal install

This will compile and install the executable hsp and the library HSProcess.Representable.

Usage

hsp supports different modes:

Evaluate an expression

It is possible to use hsp to evaluate a user expression without input using the option -e:

hsp -e "1"

Work on the stream

The standard mode of hsp process the whole stream. It accepts a string representing a transformation from the stream, that has type Data.ByteString.Lazy.ByteString, to some value with type that is an instance of Rows:

ByteString -> Rows a

Rows is a special case of Show for representing data on the command line . For example, to print on stdout what it gets from stdin:

hsp "id"

Split stream in chunks and process them

Many times, stream processing is about splitting the stream on some delimiter, like '\n', and process each chunk of data. With the standard mode of hsp this can be achieved using the split function of ByteString:

hsp "L.filter (not . null) . split '\n'"

This happens so often that hsp has a mode to split automatically the stream on a delimiter using -d [<delimiter>]. If <delimiter> is omitted, then it is set to \n. With -d, the function provided must have type:

[ByteString] -> Rows a

The command before can be rewritten as:

hsp -d "L.filter (not . null)"

Map a function on each chunk of data

A specific case of hsp -d <delimiter> is hsp -d <delimiter> -m that is equivalent of mapping the supplied function to the input list. In this case the function must have type:

ByteString -> Row a

For example, to take the first word of each line:

hsp -m "L.head . words"

When -m is specified, -d can be omitted and the delimiter is automatically set to \n.

Configuration

Haskell Stream Processor is a command line utility and for this reason it needs informations, like which modules should be loaded, that cannot be easily passed as arguments. There are two configuration files located under $HOME/.hsp, one to import modules and one to import user defined functions.

Modules

Haskell Stream Processor reads a list of modules to load from the file $HOME/.hsp/modules. Each line of this file is composed by the name of a module eventually followed by a space and it's qualified name. An example could be:

Control.Monad
Data.List L

which means that all the functions from Control.Monad and Data.List will be available to the user, but for Data.List functions you must qualify them with L..

There are some modules that are loaded automatically without qualification. In particular, the module Data.ByteString.Lazy.Char8 is automatically loaded because hsp works on lazy bytestrings. This means functions like that in Prelude work on list, like map, in hsp work on ByteStrings. Same for function that work on String.

Note that Prelude is loaded with the qualified name P, so its functions are not directly visible.

An example of module file can be found in the example directory.

User defined functions

It is possible to define new function to be used in Haskell Stream Processor inside the file $HOME/.hsp/toolkit.hs.

An example of toolkit can be found in the example directory.

Differences with the Glasgow Haskell Compiler

It is already possible to evaluate an function using the Glasgow Haskell Compiler using the option -e and by passing the custom function to interact:

ghc -e "interact id"

The main differences are that Haskell Stream Processor works on (lazy) ByteString instead of the slower String, it can load modules automatically from the module file and can load user defined functions from the toolkit.hs file. Also, Haskell Stream Processor supports different modes from working on the entire stream, like working on each line.

Examples

In all the examples, Data.ByteString is loaded without qualification whereas Data.List is qualified as L. The function match is an alias for Text.Regex.Posix.=~.

Evaluate 2^100:

hsp -e "2^100"

Print numbers from 1 to 100:

hsp -e "[1 .. 100]"

Take the first line of a stream:

... | hsp -d "L.take 1"

Take the last two lines of a stream:

... | hsp -d "L.reverse . L.take 2 . L.reverse"

Print the 10th element of each line:

... | hsp -m "(L.!! 10) . words"

Print the elements from the 2nd to the 20th of each line:

... | hsp -m "L.take 20 . L.drop 1 . words"

Get the number of words:

... | hsp -d "L.length . L.concatMap words"

Get the number of lines:

... | hsp -d "L.length"

Sort integers and remove duplicates:

... | hsp -d "L.nub . L.sort . L.map asInt"

Sum the 2nd elements of every line:

... | hsp -d "P.sum . L.map (asFloat . (L.!! 1) . words)"

Split each line on a delimiter ':' and print the second element:

... | hsp -m "(L.!! 1) . split ':'"

Remove empty lines:

... | hsp -d "L.filter (not . null)"

Filter lines that match a pattern:

... | hsp -d "L.filter (`match` "t\\w\\wt")"