@hackage dvv0.1.2.1

Dotted Version Vectors (DVV)

Dotted Version Vectors (DVV)

License: MIT

A GHC-Haskell implementation of Dotted Version Vectors (DVV), a data structure for tracking causality and resolving conflicts in distributed systems updating a single piece of data.

This library is inspired by the canonical Erlang DVVSet example implementation.

Table of Contents

Introduction

In distributed systems, multiple actors can concurrently update a piece of data. Traditional version vectors can track which updates happened after others, but they struggle to represent multiple concurrent "siblings" when conflicts occur.

Dotted Version Vectors solve this by combining:

  1. History (Context): A summary of all events seen by the system.
  2. Dots: Discrete events (actor + sequence number) that updated specific values
  3. Siblings: A set of values that are concurrent (i.e., none of them has "seen" the others).

Key Concepts

  • Dot: A pair Actor x Count representing a single write.
  • History (Version Vector): A mapping from actors to their latest sequence numbers. Map Actor Count
  • DVV: A structure containing a history and a set of active siblings. Map Actor Count x Map Dot Value

How it Works

DVV provides a mechanism for causal ordering. When an actor writes a value, it creates a "Dot" – a unique identifier for that specific version of the data. Usually the "server", i.e. the thing doing the tracking of state is the "actor".

graph TD
    A[DVV State] --> B(Causal History)
    A --> C(Active Siblings)
    B --> B1[Actor A: 5]
    B --> B2[Actor B: 3]
    C --> C1["Dot(A, 6): Value 'X'"]
    C --> C2["Dot(B, 4): Value 'Y'"]
  1. Event Creation: When actor A updates a value, it increments its counter in the DVV's history. The new value is stored associated with the new Dot (A, next_counter).
  2. Causality: If version V2 is created knowing about version V1 (e.g., V2 was created using V1 as context), V2 supersedes V1.
  3. Synchronization: When two DVVs are merged (sync), the resulting history is the point-wise maximum of both. Any sibling from one DVV is retained in the merged result only if it hasn't been superseded by the other DVV's history.

Usage

1. Construction and Initialization

You can start with an empty DVV or a singleton, i.e. no causal history, without a value or with the first value (count implicitly = 1):

import Data.DVV
import qualified Data.HashMap.Strict as Map

-- An empty DVV without any causal history
empty :: DVV String Int
empty = EmptyDVV

-- A singleton DVV (initial write with no prior causal history)
initial :: DVV String Int
initial = SingletonDVV "actor1" 42

2. Recording Events

Use event to record new updates. You can optionally provide a VersionVector as context to indicate which versions the new update supersedes.

-- Recording a new event on top of existing state
-- This will increment actor1's counter.
v1 = event initial Nothing "actor1" 43

-- Providing context to prune siblings
-- If 'v1' had multiple siblings, passing its 'context' here would replace them
v2 = event v1 (Just (context v1)) "actor2" 44

3. Synchronization (Merging)

Merge two DVVs to resolve their histories and collect concurrent siblings.

let d1 = SingletonDVV "A" 10
    d2 = SingletonDVV "B" 20
    merged = sync d1 d2
-- 'merged' now contains both values as siblings because they are concurrent.
-- values merged == [10, 20]

4. Conflict Resolution

When a sync results in multiple siblings (a conflict), you can reconcile them.

Manual Reconciliation

-- Pick the maximum value from siblings, reconcile just takes a deterministic "decider" function over a pair of `concurrent values`.
let resolved = reconcile max "resolver-actor" merged

Last-Write-Wins (LWW)

If your values have timestamps or you just want a deterministic winner:

-- Assuming values are (Timestamp, ActualValue)
let lwwResolved = lww (\(t1, _) (t2, _) -> compare t1 t2) "actor" conflictDvv

Partial Ordering and Causality

DVVs implement a Partial Order to represent the causal relationship between versions. This is exposed via the PartialOrd typeclass (from lattices or Algebra.PartialOrd).

  • leq (Less than or Equal): A <= B means that A is a causal ancestor of (or equal to) B. In other words, B "knows" everything A knows.
    • Implementation: For every actor in A's history, B's history must have a counter greater than or equal to A's.
  • Strict Inequality: A < B means A <= B AND A /= B. A happened strictly before B.
  • Concurrent (Incomparable): If neither A <= B nor B <= A holds, then A and B are concurrent (||). This indicates a conflict that needs to be resolved.

The sync operation computes the Least Upper Bound (LUB) of two DVVs, effectively merging their histories and preserving all concurrent values (siblings) until they are reconciled.

Type Safety

This library uses HashMap internally for efficiency and requires actor IDs to be instances of Hashable.

Performance Considerations

  • Singleton Optimization: SingletonDVV is a specialized constructor for the common case of a single value with no complex history, reducing memory overhead.
  • Pruning: Use the prune function to truncate old causal history if the number of actors grows too large, though this should be used with caution as it affects future synchronization accuracy.

License

This project is licensed under the MIT License - see the LICENSE file for details.