@hackage concurrency-benchmarks0.1.1

Benchmarks to compare concurrency APIs

concurrency-benchmarks

Hackage Build Status Windows Build status

Benchmarks to compare the pure concurrency overhead of various flavors of concurrent streamly streams and the async package.

Run the run.sh script to run the benchmarks and create the charts. You can use cabal new-bench or stack bench to run the benchmarks. To generate charts, run the benchmarks with --csv-raw=results.csv option and then run makecharts results.csv. Charts are generated in the charts directory.

Methodology

A total of 10,000 tasks are run for each concurrency mechanism being compared. Two independent experiments are performed:

  1. In the first experiment, each task is just a noop i.e. it takes almost 0 time to execute.
  2. In the second experiment, each task introduces a 5 second delay

The first case shows streamly's smart scheduling to automatically run the tasks in less number of threads than the actual number of tasks. When the tasks do not block and have a very low latency, streamly may run multiple tasks per thread. Therefore streamly is much faster on this benchmark.

In the second case a 5 second delay is introduced to make sure that streamly uses one thread per task which is similar to what async does and therefore a fair comparison. For the async package, mapConcurrently is used which can be compared with streamly's ahead style stream.

For streamly this is the code that is benchmarked, by default streamly has a limit on the buffer size and the number of threads, we set those limits to -1 which means there is no limit:

    let work = (\i -> threadDelay 5000000 >> return i)
    in runStream
        $ aheadly
        $ maxBuffer (-1)
        $ maxThreads (-1)
        $ S.fromFoldableM $ map work [1..10000]

For async this is the code that is benchmarked:

    let work = (\i -> threadDelay 5000000 >> return i)
    mapConcurrently work [1..10000]

Results

These charts compare streamly-0.5.1 and async-2.2.1 on a MacBook Pro with a 2.2 GHz Intel Core i7 processor.

When compiling, -threaded -with-rtsopts "-N" GHC options were used to enable the use of multiple processor cores in parallel.

For streamly, results for both async and ahead style streams are shown.

Zero delay case

Peak Memory Consumed

Comparison of maxrss

Time Taken

Comparison of time

5 second delay case

Peak Memory Consumed

Comparison of maxrss

Time Taken

Note, this time shows the overhead only and not the full time taken by the benchmark. For example the actual time taken by the async benchmark is 5.135 seconds, but since 5 second in this is the delay introduced by each parallel task, we compute the overhead of concurrency by deducting the 5 seconds from the actual time taken, so the overhead is 135 ms in case of async.

Comparison of time

Feedback

Feedback is welcome. Please raise an issue, send a PR or send an email to the author.