@hackage clustertools0.1.5

Tools for manipulating sequence clusters

Categories
- Bioinformatics
License
LicenseRef-GPL
Maintainer
Ketil Malde <ketil@malde.org>
Links
- Homepage
- Documentation
- No source repository
- Security
Versions
- 0.1.5 Mon, 6 Jun 2011
- 0.1.2 Fri, 30 Jan 2009
- 0.1.1 Thu, 31 Jul 2008
- 0.1 Sat, 8 Mar 2008

Installation
In your cabal file:
Dependencies (0)
- QuickCheck
- base >=4 && <5
- bio >=0.3.3.4
- bio >=0.4
- bytestring
- containers
Dependents (1)
@hackage/acme-everything

This contains the following tools:

To build these, you will need a Haskell compiler (the most likely candidate begin GHC), and my bioinformatics library and the SimpleArgs module installed (Downloadable from: http://malde.org/~ketil/biohaskell/).

filter - remove unwanted sequences from a clustering usage: filter seq.list < cluster.L > cluster2.L cluster2.L will only contain sequence labels found in seq.list

hist - produce a histogram of cluster sizes from a "label"-formatted clustering.

clusc - compare clusterings, calculating numerous pair-based and entropy based indices.

xcerpt - given a file containing a list of sequence labels (e.g. a "label" formatted clustering), extract matching sequences from a FASTA file. Like "agrep -d '^>'" without the bugs.

     Usage: xcerpt list.txt fasta.seq
     creates "fasta.seq.match" and "fasta.seq.rest"

add_single - add singletons to a clustering. Usage: add_single all.L clustering.L creates clustering.L_s listing all sequences in all.L but not in clustering.L, one per line.

ace2contigs - parse an ACE assembly file, and output the contigs in a FASTA file (named by tacking on .fasta to the ACE file name), and the corresponding quality information (.qual).

ace2fasta - parse an ACE assembly, and output each assembly in a separate FASTA formatted file, with the necessary gaps inserted to align the sequences (suitable for import into e.g. Seaview)

ace2clusters - parse an ACE assembly, and output clusters composed of the sequences used for each contig. The format is similar to TGICL's, with cluster output as one line consisting of a '>' and the contig name, and the next line containing the names of the sequences that comprise the cluster.

clusterlibs - given a table of regular expressions and library names, along with a clustering (TGICL-format), output a table of clusters with the library name prepended to the sequences.

@hackage clustertools0.1.5

Categories

License

Maintainer

Links

Versions

Installation

Dependencies (0)

Dependents (1)