...

access to MPI based parallel programs
access to larger amounts of memory than 32bit computers offer
access to the large public domain of scientific computing programs
access to multiple compilers
assess to large amounts of storage
access to batch processing for running numerous independent serial jobs
access to 64bit versions of programs you may already have on your 32bit desktop

...

The following programs provide thread based parallelism as well: Comsol, Matlab
The Note, the default settings for Matlab are number of threads equals number of cores on a compute node. However it is up to you to specify in slurm what is needed.

When does 64bit computing matter?

...

How do I run a compiled mpi based program across two nodes using 8 cores?

> salloc -

...

N2 -n8 -p mpi

> module load openmpi

> srun ...srun options... yourcode

This will submit your code executable to the parallel queue using mvapich2 and requesting 8 cpus.slurm partition using openmpi. See the slurm section of the wiki for further examples.

Is there a queue slurm partition for testing parallel programs that require a short run time? There is queue called paralleltest_public just for this purpose. It has a run time limit of 10 minutes and a high priority.

No. The mpi partition is the only option.

Can parallel jobs be sent to any queueslurm partition?
No... The Parallel_public queue and the test version slurm mpi partition is where most parallel jobs should goare supported. This queue partition has a limit of 64 128 cores/cpus . If you need access to more cores, we can add you to the Express_public queue which has a limit of 256 cores.per job request.

What mpi software is available?
OpenMPI , Mvapich and Mvapich2 are available on the cluster as a loadable module. Once the corresponding module is loaded, your environment will provide access to the various MPI compilers.
> module load openmpi
the default slurm supported mpi. For example, OpenMPI provides the following:
mpic++ mpicxx mpicc mpiCC mpif77 mpif90

Likewise for mvapich and mvapich2.

How can I use Portland Compilers and MPI?

Try the broadcast example found in the PGI directory tree:
/opt/shared/pgi/linux86-64/7.2-3/EXAMPLES/MPI/mpihello/mynane.c

As an example that requests 8 cores:

>module load pgi

>module load

...

openmpi

>pgcc myname.c -o myname -Mmpi -I/opt/pgi/linux86-64/7.2-3/include/

...

Note, there may be other versions of pgi and openmpi available via modules:

> module available

Where are the Portland executables ?
When you load the module for Portland, all executables will be on your path.
> ls /opt/shared/pgi/linux86-64/7.2-3/bin

Can you recommend a good text on MPI?
The Tisch Library has:
William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, Second Edition, MIT Press, 1999, ISBN: 0262571323.

Another resource, Designing and Building Parallel Programs may be useful.

*Some additional supporting info on a parallel computing course can be found on this Tufts Computer Science link.

Interesting Intel thread parallelism links and codes
Threading Building Blocks

See attachment for Intel white paper introduction pdf document.

GPU computing and CUDA resources

As part of the recent research cluster summer 2011 upgrade, one compute node was provisioned with two Nvidia Tesla M2050 GPU processors. GPU There are three nVidia GPU types available:

GPU	quantity	compute node	partition
K20	2	alpha025, omega025	gpu
M2070	2	m4c29, m4c60	m4
M2050	2	m3n45, m3n46	batch

GPU processing is an excellent means to achieve shorter run times for many algorithms. There are two several approaches to to use this resource. One is to program in NvidianVidia's programming language Cuda. The other Another approach is to use Matlab and other commercial applications that have GPU support.
Note, Nvidia Cuda , such as Mathematica, Maple, Abaqus and Ansys. Note, nVidia's Cuda language and applications such as Matlab require specific coding to use gpu resources.

You'll find the CUDA toolkit in /opt/shared/cudatoolkit and the GPU computing SDK in /opt/shared/gpucomputingsdk. The SDK contains a number of CUDA sample C applications that can be found at /opt/shared/gpucomputingsdk/4.2.9/C. Compiled samples can be found in /opt/shared/gpucomputingsdk/4.2.9/C/bin/linux/release.

How does one find out gpu specific info?
> bsub -Ip -q short_gpu /opt/shared/gpucomputingsdk/4.2.9/C/bin/linux/release/deviceQuery

To support GPU access new LSF GPU queues have been installed: short_gpu, normal_gpu and long_gpu.
For example to run one of the compiled cuda codes:

> cp /opt/shared/gpucomputingsdk/4.2.9/C/bin/linux/release/simpleStreams .
> module load cuda
> bsub -q short_gpu -Ip -R "rusage [n_gpu_jobs=1 ]" ./simpleStreams

To view a description of sample codes cuda codes from the command line:
> lynx file:///opt/shared/gpucomputingsdk/4.2.9/C/Samples.html
or
> firefox file:///opt/shared/gpucomputingsdk/4.0.17/C/Samples.html

Note: Over time different versions of Cuda and sdk will change. Check the current versions available with the module command.
> module available

Cuda versions are only available on compute nodes under directory, /usr/local/. This means that you will need to compile on a node and not on the headnode. Loading the version 6 module of cuda will modify your shell environment thus providing access.

> module load cuda/6.5.12

to see what the details are:

> module display cuda/6.5.12

to obtain bash shell access to compile a cuda program:

> srun --pty --x11=first -p gpu bash

The name of the cuda compiler is nvcc and other tools can be found in:
/opt/shared/cudatoolkit/4.2.9/cuda/binThe nvcc the nvcc command-line help file is obtained:
> nvcc -hAlso, you can view local cuda pdf docs on the cluster

Where is the html and pdf documentation located?

/usr/local/cuda-6.5/doc

To view pdf docs from the command line using a xserver:
> evince /optusr/shared/gpucomputingsdk/4.2.9/Clocal/cuda-6.5/doc/programming_guidepdf/CUDA_C_ProgrammingBest_Practices_Guide.pdfOther pdf documents in:
/opt/shared/cudatoolkit/4.2.9/cuda/doc/ and /opt/shared/gpucomputingsdk/4.2.9/C/doc/

How does one find out gpu specific and performance info?
> deviceQuery

or

> nvidia-smi

Where is the deviceQuery command? And how to find out more device info?

> which deviceQuery

> srun --pty --x11=first -p gpu deviceQuery

> srun --pty --x11=first -p gpu nvidia-smi -h

How does one make a gpu batch job?

Your compiled cuda program is submitted via a script that slurm's sbatch command can read. Use a text editor to create a file called, device_query.sh. Here we run the deviceQuery command:

#!/bin/bash #SBATCH --partition=gpu #SBATCH -c 2 #SBATCH --output=gpu.%N.%j.out #SBATCH --error=gpu.%N.%j.err module load cuda/6.5.12 deviceQuery > my_device_results.out

To submit this file to the gpu partition, run:

> sbatch device_query.sh

What gpu libraries are available on the cluster for linear algebra methods?
Cula routines are available for Cuda has support and additional support can be found in Cula routines. Cula addresses dense and sparse matrix settings related methods and access is via the module environment. To see what is current:
> module available
Cula install directory is /opt/shared/cula/

...

Matlab's Parallel Toolbox GPU demonstration applications is an excellent introduction. Additional applications such as Mathematica, Ansys, Maple and others offer various levels of support within their product. Bsub usage would be similar.

For example, to run Matlab and access GPU resources:
> module load matlab
> bsub -q short_gpu -Ip -R "rusage [n_gpu_jobs=1 ]" srun ...options... -p gpu matlab

Additional GPU resources

There are many Cuda programming resources on the web and of course the Nvidia Cuda website.

Stanford Seminars on High Performance Computing with CUDA
Stanford has posted videos from the Spring 2011 seminar series held at the Institute for Computational and Mathematical Engineering (ICME). The ICME is directed by Professor Margot Gerritsen.

Lecture 1: Intro to HPC with CUDA 1 (Cyril Zeller)
Lecture 2: Intro to HPC with CUDA 2 (Justin Luitjens)
Lecture 3: Optimizations 1 - Global Memory (Inderaj Bains)
Lecture 4: Optimizations 2 - Shared Memory (Steven Rennich)
Lecture 5: Finite Difference Stencils on Regular Grids (Paulius Micikevicius)

The following Stanford Univ. video lectures are available for viewing.

HPC & GPU Supercomputing Group of Boston
A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems.

Prof. Lorena Barba’s research group at Boston University
She is a computational scientist and fluid dynamicist with research interests including GPU computing.

Look around the web as there are many similar GPU resources.

Tufts Parallel Users group

A recently formed group with some common interests: link

Version	Old Version 39	New Version Current
Changes made by	durwood.marshall	durwood.marshall
Saved on	Aug 19, 2013	Mar 09, 2016

Versions Compared

Key

GPU computing and CUDA resources

Additional GPU resources

Tufts Parallel Users group

Page Comparison

Versions Compared

Key

GPU computing and CUDA resources

Additional GPU resources

Tufts Parallel Users group