@hackage text-builder-linear0.1

Builder for Text based on linear types

linear-builder Hackage Stackage LTS Stackage Nightly

Linear types for linear times!

Builder for strict Text, based on linear types. It's consistently outperforms lazy Builder from text as well as a strict builder from text-builder, and scales better.

Example

> :set -XOverloadedStrings
> import Data.Text.Builder.Linear
> fromText "foo" <> fromChar '_' <> fromDec (42 :: Int)
"foo_42"

Design

String builders in Haskell serve the same purpose as StringBuilder in Java to prevent quadratic slow down in concatenation.

Classic builders such as Data.Text.Lazy.Builder are lazy and fundamentally are dlist with bells and whistles: instead of actually concatenating substrings we compose actions, which implement concatenation, building a tree of thunks. The tree can be forced partially, left-to-right, producing chunks of strict Text, combined into a lazy one. Neither input, nor output need to be materialized in full, which potentially allows for fusion. Such builders allow linear time complexity, but constant factors are relatively high, because thunks are expensive. To a certain degree this is mitigated by inlining, which massively reduces number of nodes.

Strict builders such as text-builder offer another design: they first inspect their input in full to determine output length, then allocate a buffer of required size and fill it in one go. If everything inlines nicely, the length may be known in compile time, which gives blazingly fast runtime. In more complex cases it still builds a tree of thunks and forces all inputs to be materialized.

This package offers two interfaces. One is a mutable Buffer with linear API, which operates very similar to StringBuilder in Java. It allocates a buffer with extra space at the ends to append new strings. If there is not enough free space to insert new data, it allocates a twice larger buffer and copies itself there. The dispatch happens in runtime, so we do not need to inspect and materialize all inputs beforehand; and inlining is mostly irrelevant. Exponential growth provides for amortized linear time. Such structure can be implemented without linear types, but that would greatly affect user experience by polluting everything with ST monad. Users are encouraged to use Buffer API, and built-in benchmarks refer to it.

The second interface is more traditional newtype Builder = Builder (Buffer ⊸ Buffer) with Monoid instance. This type provides easy migration from other builders, but may suffer from insufficient inlining, allocating a tree of thunks. It is still significantly faster than Data.Text.Lazy.Builder, as witnessed by benchmarks for blaze-builder below.

Benchmarks

Group / size text text-builder Ratio This package Ratio
Text
1 69.2 ns 37.0 ns 0.53x 36.8 ns 0.53x
10 736 ns 344 ns 0.47x 190 ns 0.26x
100 7.07 μs 3.42 μs 0.48x 1.81 μs 0.26x
1000 74.2 μs 38.5 μs 0.52x 14.4 μs 0.19x
10000 1.10 ms 477 μs 0.43x 163 μs 0.15x
100000 23.1 ms 11.6 ms 0.50x 4.17 ms 0.18x
1000000 282 ms 166 ms 0.59x 40.4 ms 0.14x
Char
1 83.2 ns 34.8 ns 0.42x 34.8 ns 0.42x
10 378 ns 302 ns 0.80x 123 ns 0.33x
100 3.14 μs 2.46 μs 0.78x 922 ns 0.29x
1000 34.9 μs 31.3 μs 0.90x 9.37 μs 0.27x
10000 494 μs 454 μs 0.92x 101 μs 0.20x
100000 15.9 ms 13.8 ms 0.87x 1.64 ms 0.10x
1000000 212 ms 227 ms 1.07x 14.5 ms 0.07x
Decimal
1 147 ns 993 ns 6.76x 106 ns 0.72x
10 1.36 μs 10.1 μs 7.43x 845 ns 0.62x
100 13.5 μs 108 μs 7.97x 8.44 μs 0.62x
1000 136 μs 1.34 ms 9.84x 83.0 μs 0.61x
10000 1.85 ms 22.0 ms 11.86x 822 μs 0.44x
100000 33.9 ms 237 ms 7.00x 10.4 ms 0.31x
1000000 399 ms 2.504 s 6.28x 89.8 ms 0.23x
Hexadecimal
1 599 ns 940 ns 1.57x 98.9 ns 0.17x
10 6.05 μs 9.89 μs 1.64x 916 ns 0.15x
100 66.4 μs 121 μs 1.82x 9.61 μs 0.14x
1000 807 μs 1.47 ms 1.82x 96.7 μs 0.12x
10000 13.0 ms 20.8 ms 1.60x 980 μs 0.08x
100000 152 ms 223 ms 1.47x 11.7 ms 0.08x
1000000 1.657 s 2.228 s 1.34x 104 ms 0.06x
Double
1 11.9 μs 26.6 μs 2.23x 632 ns 0.05x
10 117 μs 270 μs 2.30x 6.32 μs 0.05x
100 1.20 ms 3.68 ms 3.06x 64.5 μs 0.05x
1000 12.8 ms 43.9 ms 3.44x 638 μs 0.05x
10000 126 ms 457 ms 3.63x 7.38 ms 0.06x
100000 1.266 s 4.717 s 3.73x 65.9 ms 0.05x
1000000 12.599 s 65.467 s 5.20x 653 ms 0.05x

If you are not convinced by synthetic data, here are benchmarks for blaze-markup after migration to Data.Text.Builder.Linear:

bigTable
  992  μs ±  80 μs, 49% less than baseline
basic
  4.35 μs ± 376 ns, 47% less than baseline
wideTree
  1.26 ms ±  85 μs, 53% less than baseline
wideTreeEscaping
  217  μs ± 7.8 μs, 58% less than baseline
deepTree
  242  μs ±  23 μs, 48% less than baseline
manyAttributes
  811  μs ±  79 μs, 58% less than baseline
customAttribute
  1.68 ms ± 135 μs, 56% less than baseline