# @hackage data-dispersal1.0.0.2

Space-efficient and privacy-preserving data dispersal algorithms.

### Categories

### License

LGPL-2.1-only

### Maintainer

peter.robinson@monoid.at

### Links

- Homepage
- Documentation
- No source repository

### Versions

Installation

### Dependencies (0)

### Dependents (1)

@hackage/acme-everything

Given a ByteString of length `D`

, we encode the ByteString as a list of `n`

`Fragment`

s, each containing a ByteString
of length `O(D/m)`

. Then, each fragment could be stored on a separate
machine to obtain fault-tolerance:
Even if all but `m`

of these machines crash, we can still reconstruct the original
ByteString out of the remaining `m`

fragments.
Note that the total space requirement of the `m`

fragments is `m * O(D/m)=O(D),`

which is clearly space-optimal.
The total space required for the n fragments is `O((n/m)*D)`

.
Note that `m`

and `n`

can be chosen to be of the same order, so the
asymptotic storage overhead for getting good fault-tolerance increases only by
a constant factor.

*GHCi Example:*

> :m + Data.IDA > let msg = Data.ByteString.Char8.pack "my really important data" > let fragments = encode 5 15 msg -- Now we could distributed the fragments on different sites to add some -- fault-tolerance. > let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed -- Let's look at the 5 fragments that we have left: > mapM_ (Prelude.putStrLn . show) frags' (6,[273,771,899,737,285]) (7,[289,939,612,285,936]) (8,[424,781,1001,322,788]) (9,[143,657,790,157,423]) (10,[314,674,418,888,423]) -- Space-efficiency: Note that the length of each of the 5 fragments is 5 -- and our original message has length 24. > decode frags' "my really important data"

*Encrypted Fragments:*

The module `Data.IDA`

contains an information dispersal algorithm that produces
space-optimal fragments. However, the knowledge of 1 or more fragments might
allow an adversary to deduce some information about the original data.
The module `Crypto.IDA`

combines information dispersal with
secret sharing: the knowledge of up to `m-1`

fragments does not leak any
information about the original data.

This could be useful in scenarios where we need to store data at untrusted
storage sites: To this end, we store one encrypted fragment at each site.
If at most `m-1`

of these untrusted sites collude, they will still
be unable to obtain any information about the original data.
The added security comes at the price of a slightly
increased fragment size (by an additional constant 32 bytes) and an
additional overhead in the running time of the encoding/decoding process.
The algorithm is fully described in module `Crypto.IDA`

.

*Fault-Tolerance:*

Suppose that we have `N`

machines and encode our data as `2log(N)`

fragments
with reconstruction threshold m = `log(N)`

.
Let's assume that we store each fragment on a separate machine and each
machine fails (independently) with probability at most 0.5.

What is the probability of our data being safe?

`Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).`

What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size

`O(D/m)`

, so the total space is`O(n D/ m) = 2D.`

In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.

"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.

"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146