Supercomputer Cluster in 10 Minutes!

 

OVERVIEW
If you just want to learn about clusters, only need a cluster occasionally, or can't permanently install a cluster, you might consider building one of the available CD-ROM-based “Instant” clusters.

With these Instant Clusters, you can create a set of bootable “live-Filesystem” CDs. When you need the cluster, you reboot your available systems using the CD-ROMs, do a few configuration tasks, and start using your cluster.

The cluster software is all available from the CD-ROM and the computers' hard disks are unchanged. When you are done, you simply remove the CD-ROM and reboot the system to return to the operating system installed on the hard disk.

The cluster persists until you reboot.

There are some difficulties with this approach, most notably problems with storage. It is possible to work around this problem by using a hybrid approach—setting up a dedicated system off of the master node for storage, and using the CD-ROM-based systems as compute-only nodes. Compute-only nodes should be connected to a gigabit network or higher.

Several CD-ROM-based systems of this type are freely available. The three major packages are:

ClusterKnoppix, http://bofh.be/clusterknoppix/,

Bootable Cluster CD (BCCD), http://bccd.cs.uni.edu/.

Parallel Knopix, http://idea.uab.es/mcreel/ParallelKnoppix/ParallelKnoppix.html#Download and

The next subsection contains a brief description of parallel Knopix, how it is set-up and configured, should give you the basic idea of how this system works.

 

PARALLEL KNOPIX

ParallelKnoppix (PK) is the fastest and easiest way there is to create a HPC cluster. It is designed to be easy to use, but yet it is also suitable for serious work. In fact, you may get more serious work done with PK than you would using alternatives, simply because you'll have more time free to use your cluster, rather than install and administer it.

 

PK is a re-master of the Knoppix live CD distribution of GNU Linux,  that allows setting up a cluster of machines for parallel processing using MPI. The openMPI, LAM-MPI and MPICH implementations are pre-configured and ready to use.

 

PVM is also included, which allows you to convert a room full of machines running Windows into a Linux cluster, and when you shut down your cluster, your Windows machines are returned to their original state.

The computers in the cluster can be homogeneous or heterogeneous. Getting the cluster up and running takes about 5 to 10 minutes, if the machines have PXE network cards. Clusters from 2 to 200 machines are supported.  A cluster created with PK is temporary - if you want to use it another time you must recreate it (another 5 minutes work). It's also single-user

 

Building a Cluster with Parallel Knopix

First, Download parallel knopix ISO image from:

 

http://idea.uab.es/mcreel/ParallelKnoppix/ParallelKnoppix.html#Download

 

and burn a CD. This will be your live CD of parallel knopix. And also download the complete tutorial from http://pareto.uab.es/wp/2004/62604.pdf.

 

The following section provides a quick setup for parallel knopix.

 

Prerequisites

P-KPX is designed to work with computers of the IA-32 architecture, which includes Intel Pentium IV and Xeon processors, as well as the AMD Sempron and Athlon processor.

 

The main requirement to guarantee a simple setup process is that the network cards of all slave nodes in the cluster allow booting across the network using PXE. All newer network cards support this option, though it may be necessary to configure the BIOS of the slave nodes to enable this feature. Since the slave nodes will obtain their IP addresses from the dhcp server running on the master node, it may be necessary to isolate the slave nodes from any other dhcp server that is running.

 

Booting up

The master computer must be booted using the P-KPX CD. You may need to enter your BIOS setup routine to configure the computer to boot from CD. Once this is done, place the CD in one of your computers, and boot up. You will see something similar to the following:

 

After pressing <ENTER> the computer starts to boot

 

 

P-KPX uses Knoppix's excellent hardware detection to automatically configure the mouse, video card and network card. This same auto detection will be used for all the computers in the cluster, which greatly simplifies the creation of a cluster using heterogeneous computers. When the master computer has booted we are in the KDE desktop environment:

 

Configuration

All configuration is done using a script that you can start from the ParallelKnoppix/SetupParallelKnoppix menu item.

 

To access the menu, click on the gear icon to the left of the bottom panel:

The script configures the first network card (the one that is given the name eth0) to use the IP address 192.168.0.1. If you have more than one networking card, you may need to switch cables so that eth0 is the card which connects to the cluster. Next, the terminal server is started to boot your slave computers. First you get some information:

 

 

 

Click on OK.

 

Next you are asked if you want to configure the terminal server:

 

Click OK.

Next you are asked how many nodes are in your cluster:

 

 

 

Enter the number, including the master node.

 

Next, you need to choose the network card types that are in your cluster, so that the initial kernel used to perform the PXE boot is able to set up networking on the slave nodes. There is a dialog to do this:

 

Select ALL the network card types in the cluster, and then click OK.

 

Next, you may need to pass some special options to get your nodes to boot (such as acpi=off, pci=biosirq, etc.) This information is entered into the following dialog box.

 

 

 

Now, you need to select a partition on the master node's hard drive on which to create a working directory.

 

This directory will be made writable, and NFS exported to all machines in the cluster. Any data or programs placed in this directory will be available to all the nodes.

 

The working directory will be named .parallel_knoppix_working. To minimize the chances that an existing directory has the same name. The script will only let you use partition types that are safe to write to, which includes all common partition types except NTFS.

 

If the hard disk is entirely occupied by NTFS partitions (not an uncommon case when Microsoft Windows (R) is pre-installed on the computer), one can use a USB  flash memory for the working space. Example of using flash is as below

 

 

 

 

The following message appears confirming that the partition has been mounted:

 

 

A link is created on the desktop that point to the working directory, to make navigation easy.

 

Next, the working directory must be mounted on the slave nodes.

 

This is only possible after they have been booted, so a warning message appears:

 

 

 

 

Once the slave nodes are booted, click OK, and you will receive a confirmation:

 

 

 

Next, the cluster is lambooted so that LAM/MPI may be used to run MPI-based parallel programs. We can see that the 2-node cluster of this example is successfully lambooted:

 

A message now informs you that the cluster was successfully created:

 

 

 

You are given a chance to start the ganglia monitoring daemon, which allows you to observe the activity on the cluster:

 

 

If you click YES, you will see the following information:

 

 

The cluster is now ready for use!

 

The master node has the IP address 192.168.0.1. The slaves have IP addresses that go up to 192.168.0.x, where x is the total number of computers in the cluster.