@hackage delta-h0.0.3

Online entropy-based model of lexical category acquisition.

Categories
- Natural Language Processing
License
BSD-3-Clause
Maintainer
pitekus@gmail.com
Links
- Homepage
- Documentation
- No source repository
- Security
Versions
- 0.0.3 Wed, 29 Feb 2012
- 0.0.2 Tue, 10 Jan 2012
- 0.0.1 Sun, 27 Nov 2011

Installation
In your cabal file:
Dependencies (7)
- base >=3 && <5
- binary
- bytestring
- containers
- text
- monad-atom
Dependents (1)
@hackage/acme-everything

= DELTA-H

Online entropy-based model of lexical category acquisition. Grzegorz Chrupala and Afra Alishahi

= INSTALL

Install the Haskell Platform: http://hackage.haskell.org/platform/

On linux, the following command will install the delta-h executable in the bin directory:

cabal install --prefix=pwd

= USAGE

The data directory has an example input file data/goat.txt The other files are CHILDES.

To induce a model (i.e. a set of clusters), execute the following:

./bin/delta-h learn '[-12,0,12]' data/goat.txt

The argument '[-12,0,12]' specifies the features to be used (in this case preceding bigram, focus word, and following bigram. Feature ids can be inspected in the source file src/Entropy/Features.hs

The model will be stored in data/goat.txt.[-12,0,12].learn.model

You can display the model in a human-readable format with:

./bin/delta-h display data/goat.txt.[-12,0,12].learn.model

The learned model can also be used to label input data, without further learning:

./bin/delta-h label True True data/goat.txt.[-12,0,12].learn.model <
data/goat.txt

The first argument specifies whether to use focus word for labeling, the second argument whether to avoid outputting new cluster ids (not in the model).

There is also a command which test the learned model on the word prediction task:

./bin/delta-h eval-mrr True True data/goat.txt.[-12,0,12].learn.model <
data/goat.txt

The first argument specifies whether to marginalize over all cluster assignments, the second whether to output detailed information.

The semantic property prediction task can be run with the eval-sem command:

./bin/delta-h eval-sem False data/lexicon TRAIN.pos TRAIN.cluster
TEST.pos TEST.cluster

The meaning of the arguments to this command: False - do not produce verbose output data/lexicon - semantic property lexicon file (generated from Wordnet) TRAIN.pos - POS tagged train data TRAIN.cluster - train data labeled with cluster IDs (use the label command to generate it) TEST.pos - POS tagged test data TEST.cluster - test data labeled with cluster IDs (use the label command to generate it)

= SOURCES

There are some other (currently undocumented) commands: inspect src/Main.hs

The main part of the model is implemented in src/Entropy/Algorithm.hs.

@hackage delta-h0.0.3

Categories

License

Maintainer

Links

Versions

Installation

Dependencies (7)

Dependents (1)