@hackage Holumbus-MapReduce0.1.0

a distributed MapReduce framework

This is the Holumbus-MapReduce Framework

Version 0.1.0

Stefan Schmidt sts@holumbus.org

http://holumbus.fh-wedel.de

About

Holumbus is a set of Haskell libraries. This package contains the Holumbus-MapReduce library for building and running distributed MapReduce systems.

This library depends on the Holumbus-Distributed and Holumbus-Storage libraries. If you want to run some of the examples, e.g. the distributed Web-Crawler and Indexer, the the Holumbus-Searchengine library must also be installed.

Contents

Examples Some example applications and utilities. Programs The applications you need to run a distributed MapReduce system. source Source code of the Holumbus-MapReduce library.

Requirements

So far, this library is only tested under Linux, please let us know, if there are any problems under Windows or other OSes. The Holumbus-MapReduce library requires at least GHC 6.10 and the following packages (available via Hackage).

containers hslogger directory network time bytestring binary hxt Holumbus-Distribution

Installation

A Cabal file is provided, therefore Holumbus-MapReduce can be installed using the standard Cabal way:

$ runhaskell Setup.hs configure $ runhaskell Setup.hs build $ runhaskell Setup.hs install # with root privileges

If you prefer to do it the old way there with make:

$ make build $ make install # with root privileges

Steps to make the system running

  1. Compile and install the Holumbus-Distribution library.

  2. Compile and install the Holumbus-MapReduce framework. If this is done with cabal, step 3. can be skipped, the master will be build and installed with cabal

$ make build $ make install # this with root privileges

  1. Compile the Master-Program. It is located in Programs/

$ make programs

  1. Start the PortRegistry from the Holumbus-Distribution library.

  2. Start the Master. If you want to start the master on another machine than the PortRegistry, then you have to copy the file "registry.xml" from the tmp directory to the the tmp directory on the machine the master should run on.

$ cd Programs/Master $ ./Master

Alternatively start the Master program, installed with cabal

Now, the system itself is running, but as you might wonder, you only have started the registry for communication and the master for the MapReduce system. To run your own system, you have to create a MapReduce Client and a MapReduce Worker. Here, we'll show you to run the examples, they'll show you how to create you own system.

  1. Compile the Examples. For some examples (e.g. the Crawler), this requires that you have installed the Holumbus-Searchengine framework. You can get it from the Holumbus homepage (http://holumbus.fh-wedel.de), please read its documentation for further instructions and details. You'll only need the Searchegine library installed for the MapReduce examples, not the library itself.

$ make examples # some examples require the Searchengine lib

  1. Start the Workers. At this time every worker has to run in its own directory, so create for each worker, you want to run, its own subdirectory and copy the executeable to it. If you want to run the workers on different machines, you have to copy the file "registry.xml" from your tmp directory to the tmp directory of the machine the worker will work on.

  2. Start the Client. Now you can start the client and wait the things to come. If you want to run the client on another machine than the PortRegistry, you have to copy the registry.xml file from the tmp directory, too.