@hackage dephd0.1

Analyze 'phred' output (.phd files)

Synopsis

dephd - A simple tool for base calling and quality appraisal

Reads files in phd-format (phred output), either specified individually, or in a directory (use the --dir option to read directories).

Installation

You need the GHC compiler, or if you know what you are doing, another Haskell compiler or interpreter with Cabal. You also need to install the 'bio' library (darcs get http://malde.org/~ketil/bio)

With those things in place, you should be able to do

 runhaskell Setup configure
 runhaskell Setup build
 sudo runhaskell Setup install

Optionally, add "--prefix $HOME" (without the quotes) after configure to install to your home directory - in which case you don't need the 'sudo'.

Usage

A brief usage report is printed if you run 'dephd -h'. Somewhat more detailed:

Input is specified either as a list of phd-files (typcially generated by Phred), a list of directories containing phd-files (using the --input-dirs) option, a file containing a list of names of phd-files (--input-list), or a Fasta and associated quality file (-i foo.fasta foo.qual).

Output is specified by -J, -X, -P, -R foo.ranks, -F foo.fasta, and/or -Q foo.qual. The first three generate a plot of sequence quality in JPEG files, an X window, or Postscript files, respectively. If you use -X on multiple files, hit q to terminate one window and go to the next.

The remaining three options (-R, -F, and -Q) output different aspects of the sequence information to files (specify '-' for printing to standard output instead - obviously this will be messy if you do it for more than one option!). -F and -Q is for generating the standard Fasta and Quality files, while -R produces a file with one line per sequence containing various quality measures, including a verdict ranging from Excellent, through Good and Poor, to Junk.

Filtering can be specified with the -t option, which interprets trimming information from Phred or Lucy, and chops off the offending parts, or with the -q options, which masks poor quality parts of sequences to lower case, and really poor quality parts to 'n' characters.

Bugs

Not many, I hope. The program should work in (approximately) constant space, and be able to deal with large amounts of sequences.

For further questions, email me at ketil@malde.org