Supercomputer
Cluster in 10 Minutes!
OVERVIEW
If you just want to learn about
clusters, only need a cluster occasionally, or can't permanently install a
cluster, you might consider building one of the available CD-ROM-based “Instant”
clusters.
With these Instant Clusters, you
can create a set of bootable “live-Filesystem” CDs. When
you need the cluster, you reboot your available systems using the CD-ROMs, do a
few configuration tasks, and start using your cluster.
The cluster software is all
available from the CD-ROM and the computers' hard disks are unchanged. When you
are done, you simply remove the CD-ROM and reboot the system to return to the
operating system installed on the hard disk.
The cluster persists until you
reboot.
There are some difficulties
with this approach, most notably problems with storage. It is possible to work
around this problem by using a hybrid approach—setting up a dedicated system off
of the master node for storage, and using the CD-ROM-based systems as
compute-only nodes. Compute-only nodes should be connected to a gigabit network
or higher.
Several CD-ROM-based systems of
this type are freely available. The three major packages are:
ClusterKnoppix,
http://bofh.be/clusterknoppix/,
Bootable Cluster CD (BCCD), http://bccd.cs.uni.edu/.
Parallel Knopix,
http://idea.uab.es/mcreel/ParallelKnoppix/ParallelKnoppix.html#Download
and
The next subsection contains a
brief description of parallel Knopix, how it is
set-up and configured, should give you the basic idea of how this system works.
ParallelKnoppix (PK) is the fastest and easiest way there is to create a HPC
cluster. It is designed to be easy to use, but yet it is also suitable for
serious work. In fact, you may get more serious work done with PK than you
would using alternatives, simply because you'll have
more time free to use your cluster, rather than install and administer
it.
PK is a re-master
of the Knoppix live CD distribution of GNU Linux, that allows setting
up a cluster of machines for parallel processing using MPI. The openMPI, LAM-MPI and MPICH implementations are pre-configured
and ready to use.
PVM is also
included, which allows you to convert a room full of machines running Windows
into a Linux cluster, and when you shut down your cluster, your Windows
machines are returned to their original state.
The computers in
the cluster can be homogeneous or heterogeneous. Getting the cluster up and
running takes about 5 to 10 minutes, if the machines have PXE network cards.
Clusters from 2 to 200 machines are supported. A cluster created with PK
is temporary - if you want to use it another time you must recreate it (another
5 minutes work). It's also single-user
Building
a Cluster with Parallel
Knopix
First, Download
parallel knopix ISO image from:
http://idea.uab.es/mcreel/ParallelKnoppix/ParallelKnoppix.html#Download
and burn a CD. This will be your live CD of
parallel knopix. And also download the complete
tutorial from
http://pareto.uab.es/wp/2004/62604.pdf.
The following
section provides a quick setup for parallel knopix.
P-KPX is designed to work with computers of the IA-32
architecture, which includes Intel Pentium IV and Xeon processors, as well as
the AMD Sempron and Athlon
processor.
The main requirement to guarantee a simple setup
process is that the network cards of all slave nodes in the cluster allow
booting across the network using PXE. All newer network cards support this
option, though it may be necessary to configure the BIOS of the slave nodes to
enable this feature. Since the slave nodes will obtain their IP addresses from
the dhcp server running on the master node, it may be
necessary to isolate the slave nodes from any other dhcp
server that is running.
Booting
up
The master computer must be booted using
the P-KPX CD. You may need to enter your BIOS setup routine to configure the
computer to boot from CD. Once this is done, place the CD in one of your
computers, and boot up. You will see something similar to the following:

After pressing <ENTER> the computer
starts to boot

P-KPX uses Knoppix's
excellent hardware detection to automatically configure the mouse, video card
and network card. This same auto detection will be used for all the computers
in the cluster, which greatly simplifies the creation of a cluster using
heterogeneous computers. When the master computer has booted we are in the KDE
desktop environment:

Configuration
All configuration is done
using a script that you can start from the ParallelKnoppix/SetupParallelKnoppix
menu item.
To access the menu, click on the gear icon to the
left of the bottom panel:
The script configures the first network card (the one
that is given the name eth0) to use the IP address 192.168.0.1. If you have
more than one networking card, you may need to switch cables so that eth0 is
the card which connects to the cluster. Next, the terminal server is started to
boot your slave computers. First you get some information:

Click on OK.
Next you are asked if you want to configure the
terminal server:

Click OK.
Next you are asked how many nodes are in your
cluster:

Enter the number, including the master node.
Next, you need to choose the network card types that
are in your cluster, so that the initial kernel used to perform the PXE boot is
able to set up networking on the slave nodes. There is a dialog to do this:

Select ALL the network card types in the
cluster, and then click OK.
Next, you may need to pass some special
options to get your nodes to boot (such as acpi=off, pci=biosirq, etc.) This
information is entered into the following dialog box.

Now, you need to select a partition on the master
node's hard drive on which to create a working directory.
This directory will be made writable, and NFS
exported to all machines in the cluster. Any data or programs placed in this
directory will be available to all the nodes.
The working directory will be named .parallel_knoppix_working. To minimize the
chances that an existing directory has the same name. The script will
only let you use partition types that are safe to write to, which includes all
common partition types except NTFS.
If the hard disk is entirely occupied by NTFS
partitions (not an uncommon case when Microsoft Windows (R) is pre-installed on
the computer), one can use a USB flash memory for the working space.
Example of using flash is as below

The following message appears confirming that the
partition has been mounted:

A link is created on the desktop that point to the
working directory, to make navigation easy.
Next, the working directory must be mounted on the
slave nodes.
This is only possible after they have been booted, so
a warning message appears:

Once the slave nodes are booted, click OK, and you
will receive a confirmation:

Next, the cluster is lambooted
so that LAM/MPI may be used to run MPI-based parallel programs. We can see that
the 2-node cluster of this example is successfully lambooted:

A message now informs you that the cluster was
successfully created:

You are given a chance to start the ganglia
monitoring daemon, which allows you to observe the activity on the cluster:

If you click YES, you will see the following
information:

The cluster
is now ready for use!
The master node has the IP address 192.168.0.1. The
slaves have IP addresses that go up to 192.168.0.x, where x is
the total number of computers in the cluster.