The Tufts High Performance Compute (HPC) cluster delivers 35,845,920 cpu hours and 59,427,840 gpu hours of free compute time per year to the user community.

Teraflops: 60+ (60+ trillion floating point operations per second) cpu: 4000 cores gpu: 6784 cores Interconnect: 40GB low latency ethernet

For additional information, please contact Research Technology Services at tts-research@tufts.edu


Parallel Info

Parallel programming related information

What are some reasons for using the cluster

  • access to MPI based parallel programs
  • access to larger amounts of memory than 32bit computers offer
  • access to the public domain of scientific computing programs
  • access to multiple compilers
  • assess to large amounts of storage
  • access to batch processing for running numerous independent serial jobs
  • access to 64bit versions of programs you may already have on your 32bit desktop

What is MPI?

MPI stands for Message Passing Interface. The goal of MPI is to develop a widely used standard for writing message-passing programs.

What installed programs provide a parallel solution?

The following provide MPI based solutions:  Abaqus, Ansys, Fluent, Mathematica, Matlab, Paraview

The following programs provide thread based parallelism as well: Comsol, Matlab
Note, the default settings for Matlab are number of threads equals number of cores on a compute node.  However it is up to you to specify in slurm what is needed.

When does 64bit computing matter?

When there is a need for memory and storage beyond the 32bit barriers.

Is it possible to run linux 32-bit executables on the cluster?

There is a good chance that it will succeed. But there might be other issues preventing it from running. Try it out...

Where can I find additional information about MPI?

http://www-unix.mcs.anl.gov/mpi/

http://www.nersc.gov/nusers/resources/software/libs/mpi/

http://www.faqs.org/faqs/mpi-faq/

http://www.redbooks.ibm.com/abstracts/sg245380.html

http://www.nccs.gov/user-support/training-education/hpcparallel-computing-links/

http://software.intel.com/en-us/multi-core/

What are some good web based tutorials for MPI?

http://ci-tutor.ncsa.uiuc.edu/login.php
http://www.slac.stanford.edu/~alfw/Parallel.html
https://computing.llnl.gov/tutorials/parallel_comp/

How do I run a compiled mpi based program across two nodes using 8 cores?

> salloc -N2 -n8 -p mpi

> module load openmpi

> srun   ...srun options...   yourcode

 

This will submit your code executable to the slurm partition using openmpi.  See the slurm section of the wiki for further examples.

 

Is there a slurm partition for testing parallel programs that require a short run time?

No.  The mpi partition is the only option.

Can parallel jobs be sent to any slurm partition?
No... The slurm mpi partition is where parallel jobs are supported. This partition has a limit of 128 cores/cpus per job request.

What mpi software is available?
OpenMPI is the default slurm supported mpi.  For example, OpenMPI provides the following:
mpic++  mpicxx mpicc mpiCC mpif77 mpif90

 

How can I use Portland Compilers and MPI?

Try the broadcast example found in the PGI directory tree:
/opt/shared/pgi/linux86-64/7.2-3/EXAMPLES/MPI/mpihello/mynane.c

As an example:

>module load pgi

>module load openmpi

>pgcc myname.c -o myname -Mmpi -I/opt/pgi/linux86-64/7.2-3/include/

Note, there may be other versions of pgi and openmpi available via modules:

> module available


Where are  the Portland executables ?
When you load the module for Portland, all executables will be on your path.
> ls /opt/shared/pgi/linux86-64/7.2-3/bin

Can you recommend a good text on MPI?
The Tisch Library has:
William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, Second Edition, MIT Press, 1999, ISBN: 0262571323.

Another resource, Designing and Building Parallel Programs may be useful.

 

Interesting Intel thread parallelism links and codes
Threading Building Blocks

See attachment for Intel white paper introduction pdf document.

GPU computing and CUDA resources

There are three nVidia GPU types available: 

GPUquantitycompute nodepartition
K202alpha025, omega025gpu
M20702m4c29, m4c60m4
M20502m3n45, m3n46batch

 

 GPU processing is an excellent means to achieve shorter run times for many algorithms. There are several approaches to use this resource. One is to program in nVidia's programming language Cuda. Another approach is to use Matlab and other commercial applications that have GPU support, such as Mathematica, Maple, Abaqus and Ansys.  Note, nVidia's Cuda language and applications such as Matlab require specific coding to use gpu resources.

Note: Over time different versions of Cuda and sdk will change. Check the current versions available with the module command.
> module available

Cuda versions are only available on compute nodes under directory, /usr/local/.  This means that you will need to compile on a node and not on the headnode.  Loading the version 6 module of cuda will modify your shell environment thus providing access. 

> module load cuda/6.5.12

to see what the details are:

> module display cuda/6.5.12

to obtain bash shell access to compile a cuda program:

>  srun --pty --x11=first -p gpu bash

The name of the cuda compiler is nvcc and the nvcc command-line  help file is obtained:
> nvcc -h

Where is the html and pdf documentation located?

/usr/local/cuda-6.5/doc

To view pdf docs from the command line using a xserver:
> evince /usr/local/cuda-6.5/doc/pdf/CUDA_C_Best_Practices_Guide.pdf

How does one find out gpu specific and performance info?
> deviceQuery

or

> nvidia-smi

Where is the deviceQuery command? And how to find out more device info?
 

> which deviceQuery

> srun --pty --x11=first -p gpu deviceQuery

> srun --pty --x11=first -p gpu nvidia-smi  -h


 How does one make a gpu batch job?

Your compiled cuda program is submitted via a script that slurm's sbatch command can read.  Use a text editor to create a file called, device_query.sh.  Here we run the deviceQuery command:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH -c 2
#SBATCH --output=gpu.%N.%j.out
#SBATCH --error=gpu.%N.%j.err
module load cuda/6.5.12
deviceQuery   >  my_device_results.out

To submit this file to the gpu partition,  run:

> sbatch device_query.sh

 

 

 

What gpu libraries are available on the cluster for linear algebra methods?
 Cuda has support and additional support can be found in  Cula routines.  Cula addresses dense and sparse matrix related methods and access is via the module environment. To see what is current:
> module available
Cula install directory is /opt/shared/cula/

Matlab GPU

A nice introductory article from Desktop Engineering of Matlab's GPU capability can be found here.

Matlab's Parallel Toolbox GPU demonstration applications is an excellent introduction. Additional applications such as Mathematica, Ansys, Maple and others offer various levels of support within their product.

For example, to run Matlab and access GPU resources:
> module load matlab
> srun ...options...  -p gpu  matlab

Additional GPU resources

There are many Cuda programming resources on the web and of course the Nvidia Cuda website.

 

HPC & GPU Supercomputing Group of Boston
A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems.

Look around the web as there are many similar GPU resources.


For additional information, please contact Research Technology Services at tts-research@tufts.edu