The Tufts High Performance Compute (HPC) cluster delivers 35,845,920 cpu hours and 59,427,840 gpu hours of free compute time per year to the user community.

Teraflops: 60+ (60+ trillion floating point operations per second) cpu: 4000 cores gpu: 6784 cores Interconnect: 40GB low latency ethernet

For additional information, please contact Research Technology Services at tts-research@tufts.edu


Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

Parallel programming related information

What are some reasons for using the cluster

  • access to MPI based parallel programs
  • access to larger amounts of memory than 32bit computers offer
  • access to the large public domain of scientific computing programs
  • access to compilers
  • assess to large amounts of storage
  • access to batch processing for running numerous independent serial jobs
  • access to 64bit versions of programs you may already have on your 32bit desktop

What is MPI?

MPI stands for Message Passing Interface. The goal of MPI is to develop a widely used standard for writing message-passing programs.

What installed programs provide a parallel solution?

The following provide MPI based solutions:  Abaqus, Ansys, Fluent, Mathematica, Matlab, Paraview

The following provide thread based parallelism: comsol, matlab
The default setting is single thread.

When does 64bit computing matter?

When there is a need for memory and storage beyond the 32bit barriers.

Is it possible to run linux 32-bit executables on the cluster?

There is a good chance that it will succeed. But there might be other issues preventing it from running. Try it out...

Where can I find additional information about MPI?

http://www-unix.mcs.anl.gov/mpi/

http://www.nersc.gov/nusers/resources/software/libs/mpi/

http://www.faqs.org/faqs/mpi-faq/

http://www.redbooks.ibm.com/abstracts/sg245380.html

http://www.nccs.gov/user-support/training-education/hpcparallel-computing-links/

http://software.intel.com/en-us/multi-core/

What are some good web based tutorials for MPI?

http://ci-tutor.ncsa.uiuc.edu/login.php
http://www.slac.stanford.edu/~alfw/Parallel.html
https://computing.llnl.gov/tutorials/parallel_comp/

How do I run a compiled mpi based program?

-bash-3.2$ bsub -I -q parallel_public -a mvapich2 -n 8 mpirun.lsf yourcode

This will submit your code executable to the parallel queue using mvapich2 and requesting 8 cpus.

Is there a queue for testing parallel programs that require a short run time?
There is queue called paralleltest_public just for this purpose. It has a run time limit of 10 minutes and a high priority.

Can parallel jobs be sent to any queue?
No... The Parallel_public queue and the test version is where most jobs should go. This queue has a limit of 64 cores/cpus. If you need access to more cores, we can add you to the Express_public queue which has a limit of 256 cores.

What mpi software is available?
OpenMPI, Mvapich and Mvapich2 are available on the cluster as a loadable module. Once the corresponding module is loaded, your environment will provide access to the various MPI compilers.
> module load openmpi
For example, OpenMPI provides the following:
mpic++ mpicxx mpicc mpiCC mpif77 mpif90

Likewise for mvapich and mvapich2.

How can I use Portland Compilers and MPI?

Try the broadcast example found in the PGI directory tree:
/opt/pgi/linux86-64/7.2-3/EXAMPLES/MPI/mpihello/mynane.c

As an example that requests 8 cores:

>module load pgi
>module load mvapich2
>pgcc myname.c -o myname -Mmpi -I/opt/pgi/linux86-64/7.2-3/include/
>bsub -I -q parallel_public -a mvapich2 -n 8 mpirun.lsf ./myname

Where are all the Portland executables and what are their names?
When you load the module for Portland, all executables will be on your path.
> ls /opt/pgi/linux86-64/7.2-3/bin

Can you recommend a good text on MPI?
The Tisch Library has:
William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, Second Edition, MIT Press, 1999, ISBN: 0262571323.

Another resource, Designing and Building Parallel Programs may be useful.

*Some additional supporting info on a parallel computing course can be found on this Tufts Computer Science link.

Interesting Intel thread parallelism links and codes
Threading Building Blocks

See attachment for Intel white paper introduction pdf document.

GPU computing and CUDA resources

As part of the recent research cluster summer 2011 upgrade, one compute node was provisioned with two Nvidia Tesla M2050 GPU processors. GPU processing is an excellent means to achieve shorter run times for many algorithms. There are two approaches to to use this resource. One is to program in Nvidia's programming language Cuda. The other approach is to use Matlab and other commercial applications that have GPU support.
Note, Nvidia Cuda and applications such as Matlab require specific coding to use gpu resources.

You'll find the CUDA toolkit in /opt/shared/cudatoolkit and the GPU computing SDK in /opt/share/gpucomputingsdk. The SDK contains a number of CUDA sample C applications that can be found at /opt/shared/gpucomputingsdk/4.0.17/C. Compiled samples can be found in /opt/shared/gpucomputingsdk/4.0.17/C/bin/linux/release.

To support GPU access new LSF GPU queues have been installed: short_gpu, normal_gpu and long_gpu.

For example to run Matlab and access GPU resources:
> module load matlab
> bsub -q short_gpu -Ip -R "rusage [n_gpu_jobs=1|n_gpu_jobs=1 ]" matlab

Matlab's Parallel Toolbox GPU demonstration applications is an excellent introduction. Additional applications such as Mathematica, Ansys, Maple and others offer various levels of support within their product. Bsub usage would be similar.

Additional GPU resources

There are many Cuda programming resources on the web and of course the Nvidia Cuda website.

Stanford Seminars on High Performance Computing with CUDA
Stanford has posted videos from the Spring 2011 seminar series held at the Institute for Computational and Mathematical Engineering (ICME). The ICME is directed by Professor Margot Gerritsen.

  • Lecture 1: Intro to HPC with CUDA 1 (Cyril Zeller)
  • Lecture 2: Intro to HPC with CUDA 2 (Justin Luitjens)
  • Lecture 3: Optimizations 1 - Global Memory (Inderaj Bains)
  • Lecture 4: Optimizations 2 - Shared Memory (Steven Rennich)
  • Lecture 5: Finite Difference Stencils on Regular Grids (Paulius Micikevicius)

The following Stanford Univ. video lectures are available for viewing.

Tufts Parallel Users group

A recently formed group with some common interests: link

  • No labels