@haskell bytestring0.9.1.6

Fast, packed, strict and lazy byte arrays with a list interface


           ByteString : Fast, packed strings of bytes

This library provides the Data.ByteString library -- strict and lazy byte arrays manipulable as strings -- providing very time and space efficient string and IO operations.

For very large data requirements, or constraints on heap size, Data.ByteString.Lazy is provided, a lazy list of bytestring chunks. Efficient processing of multi-gigabyte data can be achieved this way.

Requirements: > Cabal > GHC 6.4 or greater, or hugs

Building: > runhaskell Setup.lhs configure --prefix=/f/g > runhaskell Setup.lhs build > runhaskell Setup.lhs install

After installation, you can run the testsuite as follows:

    > cd tests ; make
or
    > cd tests ; make hugs

For the full test and benchmark suite, you need GHC and Hugs:

    > cd tests ; make everything

Authors: ByteString was derived from the GHC PackedString library, originally written by Bryan O'Sullivan, and then by Simon Marlow. It was adapted, and greatly extended for darcs by David Roundy, and others. Don Stewart cleaned up and further extended the implementation. Duncan Coutts wrote much of the .Lazy code. Don, Duncan and Roman Leshchinskiy wrote the fusion system.


Performance, some random numbers (with GHC):

This table compares the performance of common operations ByteString, from various string libraries.

Size of test data: 21256k, Linux 3.2Ghz P4

                      FPS7       SPS     PS      [a]    

++ 0.028 ! ! 1.288
length 0.000 0.000 0.000 0.131
pack 0.303 0.502 0.337 -
unpack 3.319* 1.630 7.445 -
compare 0.000 0.000 0.000 0.000
index 0.000 0.000 0.000 0.000
map 2.762* 2.917 4.813 7.286
filter 0.304 2.805 0.954 0.305
take 0.000 0.000 0.024 0.005
drop 0.000 0.000 11.768 0.130
takeWhile 0.000 1.498 0.000 0.000
dropWhile 0.000 1.985 8.447 0.130
span 0.000 9.289 11.144 0.131
break 0.000 9.383 11.268 0.133
lines 0.052 1.114 1.367 2.790
unlines 0.048 ! ! 10.950
words 1.344 2.128 5.644 4.184
unwords 0.016 ! ! 1.305
reverse 0.024 12.997 13.018 1.622
concat 0.000 12.701 11.459 1.163
cons 0.016 2.064 8.358 0.131
empty 0.000 0.000 0.000 0.000
head 0.000 0.000 0.000 0.000
tail 0.000 0.000 14.490 0.130
elem 0.000 1.490 0.001 0.000
last 0.000 - - 0.143
init 0.000 - - 1.147
inits 0.414 - - !
tails 0.460 - - 1.136
intersperse 0.040 - - 10.517
any 0.000 - - 0.000
all 0.000 - - 0.000
sort 0.168 - - ! maximum 0.024 - - 0.183 minimum 0.025 - - 0.185 replicate 0.000 - - 0.053
findIndex 0.096 find 0.120 - - 0.000
elemIndex 0.000 - - 0.000
elemIndicies 0.008 - - 0.314
foldl 0.148 spanEnd 0.000 snoc 0.016 filterChar 0.031
filterNotChar 0.124 join 0.016
split 0.032
findIndices 0.408
splitAt 0.000
lineIndices 0.029
breakOn 0.000
breakSpace 0.000 splitWith 0.329 dropSpace 0.000 dropSpaceEnd 0.000 joinWithChar 0.017 join / 0.016 zip 0.960 zipWith 0.892 isSubstringOf 0.039 isPrefixOf 0.000 isSuffixOf 0.000 count 0.021

Key: FPS6 = FPS 0.6 SPS = Simon Marlow's packedstring prototype PS = Data.PackedString [a] = [Char]

 -    = no function exists
 !    = stack or memory exhaustion

== Stress testing really big strings

Doing some stress testing of FPS, here are some results for 0.5G strings.

3.2Ghz box, 2G physical mem.

Size of test data: 524288k Size of test data: 524288k Char8 Word8

Effectively O(1) or O(m) where m < n all 0.000 0.000
any 0.000 0.004
break 0.000 0.000
breakChar 0.000 0.000
breakSpace 0.000
compare 0.000
concat 0.000
drop 0.000
dropSpace 0.000
dropSpaceEnd 0.000
dropWhile 0.000 0.000
elem 0.000 0.000
elemIndex 0.000 0.000
elemIndexLast 0.000 0.000
empty 0.000
head 0.000 0.000
index 0.000 0.000
init 0.000
last 0.000 0.000
length 0.000
notElem 0.000 0.000
span 0.000 0.000
spanChar 0.000 0.000
spanEnd 0.000 0.000
splitAt 0.000
tail 0.000
take 0.000
takeWhile 0.000 0.000
isPrefixOf 0.000
isSuffixOf 0.000
addr1 0.000
addr2 0.000

O(n) ++ 0.676
map 6.080 5.868
cons 0.396 0.396
snoc 0.400 0.400
find 3.240
split 1.204 1.200
lines 2.000
foldl 3.804
unwords 0.552
reverse 0.884
findIndex 3.128
filterChar 0.756 0.732
filter/='f' 8.265 7.012
filterNotChar 4.456 3.388
join 0.400
sort 4.344
maximum 0.776 0.764
minimum 0.772 0.776
replicate 0.008 0.000
elemIndices 0.240 0.240
lineIndices 1.092
joinWithChar 0.400 0.400
isSubstringOf 0.052
count 0.748

slow O(n) words 38.722
group 77.261
groupBy 96.226
inits 32.430
tails 23.225
findIndices 13.841 15.825
splitWith 18.445 19.225
zip 33.926
zipWith 33.562