Master-Slave
Clusters
Typically a
Master – Slave clustering software stack looks something like this -

Custom
written clustering
applications make use of a Message Parsing layer to interact and take advantage
of various compute nodes.
The
message parsing layer
abstracts underlying network type and gives you an API using which you can
write software code which dispatches jobs on remote nodes. Most commonly used
message parsing layers are -
1.
MPICH -
http://www-unix.mcs.anl.gov/mpi/mpich/
2.
Open-MPI –
http://open-mpi.org/ This is a preferred
implementation because of its speed of execution.
More on there differences -
www.mcs.anl.gov/~gropp/bib/papers/2002/mpiandpvm.pdf
A High
level MPI program looks very similar to a program using BSD-socket API, with the
primary difference being that MPI abstracts any lower level TCP/IP details and
network topology and used node id's instead. Here is
an example -

But, MPI
fails to recover from any fault.
This
brought about the need of PVM, which was designed and better suited to recover
application and its data through faults using advanced check-pointing
techniques.
Job
scheduling and launching refers to the way you launch and monitor your applications on clusters.
It tightly works with clustering management software which keeps tracks on
online nodes using a software technique called heartbeat. It provides
your methods for automatically starting jobs on nodes when they recover from a
fault.
Clustering
Management Software
generally provide a CLI or web interface to monitor
performance and health of individual nodes.
Typically
the distributions come with a database server which keeps track of cluster
memberships and a daemon which monitors the nodes. Whenever a node is added/removed/recovered , the new task (a description a parallel
computation as a part of the application) is pushed to the node. And its
response is logged in a cluster wide log. This log can later be viewed and
analyzed using a web interface.
An example of Ganglia Muster Manager from Rocks distribution.

Most Master-Slave distributions support different HPC drivers for interconnect and storage. The most commonly shipped drivers are -
1.
Gigabyte ethernet.
Comes handly if you are working with MPI.
2.
Infiniband
1.
GFS – creates single file-system
image on shared storage for concurrent access from all the nodes
2.
PVFS – creates a single file-system
image on distributed storage. This is less fault tolerant compared to GFS, as
whenever a single node (with some local storage) goes down the overall
availability of the file system suffers.
Master
– Slave Distributions
The main
project which started it all was the Beowulf Project. Its an open source effort to
bring Linux clustering to mainstream. There are many distributions which are
all based on Beowulf, but provide different tools - compilers, libraries,
file-systems etc.
These
distributions are -
Compiler and libraries for Open-MPI, MPICH, PVM and LAM-MPI
(old cousin of Open-MPI) are bundled. PVFS and GFS drivers are also provided.
And comes with a web interface to monitor memory, disk and cpu stats of all the nodes.
One great plus point with Rocks is that it comes with Sun
Grid Engine a powerful platform for creating and batching parallel
applications. I would strongly recommend using this.
AS like any other enterprise solution from RHEL is
commercial but open source, the souce rpms for the same can be found at -
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/2.1AS/en/os
All these
installations are Anaconda (Redhat distribution
Installer) based. Hence installing them is no different from installing any
Fedora RHEL versions. Same goes for configuring master node for any day-to-day
operations.
Most
distributions support central installation where, installation for all the
nodes is kick-started from the head-node. The following page shows one example
setup.
The
following is an example distribution setup.

Notes :-
Two very
significant documents covering entire cluster setup (including hardware) are -
http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book/beowulf_book/index.html
http://www.rocksclusters.org/rocksapalooza/2006/lab-building-cluster.pdf