@hackage HaskRel0.1.0.2

HaskRel, Haskell as a DBMS with support for the relational algebra

HaskRel, Haskell as a DBMS with support for the relational algebra

(C) 2015, Thor Michael Støre

HaskRel aims to define those elements of the relational theory of database management that Haskell can accommodate, thus enabling Haskell (or more precisely GHC) in its own right as a DBMS with first-class support for those parts of the relational model. It does not qualify as a proper RDBMS since it as-is only defines the relational algebra, base relational variables and relational assignment. It does not define the relational calculus, views, constraints and transactions (beyond the fundamental requirement that the tuples of relations are to be unique), certain operators like relation valued aggregate operators, nor a few minor or even deprecated operators such as DIVIDE. The implemented parts are decently complete even if there are major implementation shortcomings that prevent this from being practically usable as an actual DBMS.

I refer to it as first-class since the types of the relational model are first-class types to Haskell, and the Haskell type system is able to induce the type resulting of relational expressions (for instance that a natural join of two relations results in a relation with a heading that is the setwise union of the headings of the original relations).

As such it isn't as much just an implementation of the relational model built using Haskell (the way, say, PostgreSQL is an implementation of SQL using C), as a Haskell library that accommodates a subset of the relational model (unlike PostgreSQL, which doesn't turn C into an RDBMS), where GHCi is employed as a database terminal front-end.

Model

HaskRel is based on the relational model of database management, as defined by Chris Date and Hugh Darwen today. For an understanding of the foundation HaskRel builds on see for instance SQL and Relational Theory, 2nd ed., which has been the principal guide in this endeavor (I have plenty of SQL in my background so a book on SQL and the relational model has been a good introduction to the latter). See also The Third Manifesto (TTM), and the additional material on http://www.thethirdmanifesto.com/ (this might be better to start with for those mathematically inclined but not steeped in SQL). Regarding The Third Manifesto it is important to note that all parts of TTM that overlap with something preexisting in Haskell or which Haskell cannot support have been disregarded when implementing HaskRel (it's about determining where we stand, after all).

HaskRel is not based on SQL, there is no support for anything that it defines beyond what the relational model does, nor plans for that. Tutorial D has been the reference when implementing the functions of the relational algebra, although HaskRel does not satisfy the criteria of D. For a project of this kind it is sensible to follow developed theory as it stands rather than an industry standard that deviates from it. More specifically, HaskRel is intended as an exploration of the possibilities to directly employ a general purpose programming language and its associated REPL as a relational database management system, or even as a bit of an academic exercise, and a great deal of SQL is not relevant towards this end. (SQL defines for instance its own types, while HaskRel is explicitly intended to build upon Haskell's, and SQL's type system is weaker than Haskell's, in that it employs a wide range of coercions between types).

Features and aspects of SQL that go beyond or deviate from relational theory are by and large irrelevant for the purpose of HaskRel, and may even overlap or conflict with Haskell idioms that are more relevant to adhere to for a "make an RDBMS out of a general purpose programming language" project. (Parts of Tutorial D also overlap, such as the THE_ operator and selectors vs. show and read, but all in all a fraction of what SQL does.) Even if there were plans to go beyond this and support SQL it would be a good first step to first support the fundamentals of the relational model both as completely as it is reasonable to do, and to do so properly.

In addition to naming certain other aspects of Tutorial D has been adopted, such as argument order. Such aspects may be changed in the future if HaskRel is found to be solid enough to stand on its own, particularly if there are ways to express this that are more idiomatic vis-a-vis Haskell (Tutorial D is, after all, not a definition of the relational model, but a vehicle for understanding it).

Of the shortcomings of model that are mentioned above the relational calculus, database constraints, and transactions would be possible to add in a reasonably complete manner (I hope I am able to work on this). Views would be the most difficult, as far as I able to tell it is not possible to properly implement this fully within Haskell, although I don't know if some part of it can be. (This is something SQL products have had difficulties with, after all, and an area where the SQL standard is obtuse.) One further particular issue is that data definition is performed by writing, compiling and loading Haskell, which is very static compared to DDL in a proper DBMS.

Implementation

HaskRel is developed and tested with GHC 7.10.2. It builds upon HList (at least version 0.4.0.0). Separately from this library there is also an optional variant that builds on TIPs instead of records, which additionally relies on Lens (tested with version 4.12.3).

There are many missing implementation level features that database products rely on, enough so that HaskRel neither is nor will be usable as a production system, even if the shortcomings mentioned regarding the model was in place. Query optimization and indexes are the major ones, just for starters, which would require a monumental effort that there are no plans for. There is also the issue that even slightly complex queries have compile times running into several seconds. HaskRel also builds on Data.Set to manage relation bodies, although this is too limited to support a practically usable RDBMS, at least by itself.

In any case, HaskRel is very, very far from being a usable database management system, and there is as such no plan to bring it there, but I hope it's useful from an academic perspective.

Trying it out

A Haskell version of the ubiquitous suppliers-and-parts database is included as a sample database, with GHC installed this can be loaded by running suppliersPartsDB.sh in examples from the command line. This loads SuppliersPartsExample.hs, which contains Haskell renditions of Tutorial D expressions used as examples in SQL and Relational Theory, 2nd ed chapters 6 and 7. In the documentation see particularly Database.HaskRel.Relational.Expression, which "pulls together" both the algebra and assignment functions and generalizes them to operate as they should in a DBMS (with caveats), on both base variables and values.

Alternatively, cabal repl will start a GHCi session with the required modules loaded, although one will have to either load a file with definitions, or define some elements to play around with. In the example below most everything is ad-hoc, predefining a few values would make certain expressions look less messy. Here are what relation constants look like:

*Database.HaskRel.RDBMS> let foo = relation' [( 10, "foo" ),( 20, "bar" ) ] :: Relation '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> let bar = relation' [( 10, "one" ),( 30, "two" ) ] :: Relation '[Attr "asdf" Int, Attr "zxcv" String]
*Database.HaskRel.RDBMS> pt foo
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> pt bar
┌─────────────┬────────────────┐
│ asdf :: Int │ zxcv :: String │
╞═════════════╪════════════════╡
│ 10          │ one            │
│ 30          │ two            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> pt$ foo `naturalJoin` bar
┌─────────────┬────────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │ zxcv :: String │
╞═════════════╪════════════════╪════════════════╡
│ 10          │ foo            │ one            │
└─────────────┴────────────────┴────────────────┘

(Note that certain operating systems and browsers have difficulties displaying the horizontal lines of tables, either single or double. On OS X Firefox has worked better than Chrome.)

As advertised, Haskell infers the type of relational expressions (mind the type synonyms though):

*Database.HaskRel.RDBMS> :t foo
foo :: Relation '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> :t bar
bar :: Relation '[Attr "asdf" Int, Attr "zxcv" String]
*Database.HaskRel.RDBMS> :t (foo `naturalJoin` bar)
(foo `nJoin` bar)
  :: containers-0.5.6.2:Data.Set.Base.Set
       (RTuple
          '[Tagged "asdf" Int, Tagged "qwer" String, Tagged "zxcv" [Char]])

Ad-hoc relvars can be defined thusly:

*Database.HaskRel.RDBMS> let baz = Relvar "baz.rv" :: Relvar '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> assign baz (empty :: Relation '[Attr "asdf" Int, Attr "qwer" String] )
Value assigned to baz.rv

See Definition.hs of the SuppliersPartsDB for a way to do this properly.

assign against a newly defined relvar initializes it in the file system, and will create a file relative to the directory the interpreter is run from.

*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> insert baz foo
Inserted 2 of 2 tuples into baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> insert baz ( relation [rTuple ((Label::Label "asdf") .=. 30, (Label::Label "qwer") .=. "frotz")] )
Inserted 1 of 1 tuples into baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
│ 30          │ frotz          │
└─────────────┴────────────────┘

Concise expression of updates require a set of language extensions (this in addition to DataKinds, which this module enables by default):

*Database.HaskRel.RDBMS> :set -XQuasiQuotes -XKindSignatures -XViewPatterns
*Database.HaskRel.RDBMS> update baz (\ [pun|asdf|] -> asdf > 15 ) (\ [pun|qwer|] -> case (qwer ++ "-new") of (qwer) -> [pun|qwer|])
Updated 2 of 3 tuples in baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar-new        │
│ 30          │ frotz-new      │
└─────────────┴────────────────┘

All functions of the relational algebra that HaskRel implement work against both relation constants and relvars, as do the print functions p and pt:

*Database.HaskRel.RDBMS> p$ baz `nJoin` bar
┌──────┬───────────┬──────┐
│ asdf │ qwer      │ zxcv │
╞══════╪═══════════╪══════╡
│ 10   │ foo       │ one  │
│ 30   │ frotz-new │ two  │
└──────┴───────────┴──────┘

Contributions

While contributions to HaskRel itself would be great, the limiting factors for HaskRel to develop further, let alone be usable, is in its foundations. The most interesting work that could be done with HaskRel is to go through it and find points where Haskell or its ecosystem could be improved to the benefit of both HaskRel and other projects, and focus development efforts on that. The two most important libraries that HaskRel builds on are HList and Data.Set; efforts towards improving them or arguments towards replacing them would be beneficial.

Making HaskRel usable for practical purposes is not on the horizon, but it may be possible to improve it further towards being a demonstrator of the relational model. A list of "things Haskell/GHC needs in order to qualify as a first class RDBMS" would also be a fun list to have, even if there isn't a chance of it happening.