@hackage hs-conllu0.1.5

Conllu validating parser and utils.

-- mode: org --

#+TITLE: hs-conllu

[[https://travis-ci.org/odanoburu/hs-conllu][file:https://travis-ci.org/odanoburu/hs-conllu.svg?branch=master]] [[http://hackage.haskell.org/package/hs-conllu][file:https://img.shields.io/hackage/v/hs-conllu.svg?style=flt]]

this package provides a validating[fn:1] parser of the [[http://universaldependencies.org/format.html][CoNLL-U format]], along with a data model for its constituents. reading, pretty-printing, and diffing functions are also provided.

further processing utilities are being developed and will be placed in a separate package.

  • installation =hs-conllu= is available on [[http://hackage.haskell.org/package/hs-conllu][Hackage]], but if you prefer to install from source: #+BEGIN_SRC sh cd /path/of/choice/ git clone $REPO_URL #+END_SRC

    • using =cabal=: #+BEGIN_SRC sh cabal install #+END_SRC
    • using =stack=: #+BEGIN_SRC sh stack setup stack build stack install --system-ghc #+END_SRC

    the library is tested with multiple GHC versions, on Linux and on OSX (thanks Travis!).

    if you have problems with the dependency versions, you may try to alter them in the cabal file for the version you have. the version bounds were generated automatically by cabal, and are probably conservative -- the library probably will probably still work if you have the same major version. (if it does, make a PR!)

    if you don't want to have this kind of problem anymore, try [[https://docs.haskellstack.org/en/stable/README/][stack]] (see why [[https://www.fpcomplete.com/blog/2015/06/why-is-stack-not-cabal][here]]).

  • usage if you would like to request features, please open an issue.

** hs-conllu, the executable this executable can be called using stack by : stack exec hs-conllu [subcommand] [args] it currently has two subcommands:

  • validate :: read and pretty-print the file given as argument.
  • diff :: diff the two CoNLL-U files provided as arguments, and print them. this assumes changes have only been made to word fields, not to sentence ordering, etc. if you'd like finer grained diffing, you will have to use the library.

** Reading CoNLL-U files the reading functions are in the =IO= module. #+BEGIN_SRC sh $ ghci

import Conllu.IO d <- readConllu "path/to/conllu" #+END_SRC will read the file at the specified path, or all the =*.conllu= files in that path.

if your CoNLL-U files don't stricly follow the specification or I got the parser wrong, please open an issue! aditionally, you may solve your problem if you take a look at the =Parser= module.

** Customizable parsers if you just want to tweak how a few fields of the CoNLL-U format are parsed, you may write a parser for that field and then customize the standard parser with it. see the Haddock documentation for the =Parse= module.

I didn't make the parser as customizable as it could be, so if that bothers you, please create an issue or file a PR!

** Pretty-Printing the printing functions are in the =Print= module. see the Haddock documentation!

** Diffing see the =Diff= module Haddock documentation.

  • contributing I'm a new haskeller, so any help will probably be useful -- even if its just a few pointers and comments on how I can improve the library or my code.

    if you want to contribute code, let me know, and go right on. you may want to look at the =TODO.org= file.

  • Footnotes

[fn:1] it currently only validates the CoNLL-U syntax, not its semantics (i.e., it will report an error if it finds a letter on the ID field, but won't complain if you specified an inexisting word as HEAD of another word).