...

access to MPI based parallel programs
access to larger amounts of memory than 32bit computers offer
access to the large public domain of scientific computing programs
access to multiple compilers
assess to large amounts of storage
access to batch processing for running numerous independent serial jobs
access to 64bit versions of programs you may already have on your 32bit desktop

...

The following programs provide thread based parallelism as well: Comsol, Matlab
The Note, the default settings for Matlab are number of threads equals number of cores on a compute node. However it is up to you to specify in slurm what is needed.

When does 64bit computing matter?

...

> salloc -N2 -n8 -p mpi

> module load openmpi

> srun ...srun options... yourcode

This will submit your code executable to the slurm partition using openmpi. See the slurm section of the wiki for further examples.

...

See attachment for Intel white paper introduction pdf document.

GPU computing and CUDA resources

There are three nVidia GPU types available:

...

Cuda versions are only available on compute nodes under directory, /usr/local/. This means that you will need to compile on a node and not on the headnode. Loading the version 6 module of cuda will modify your shell environment thus providing access.

> module load cuda/6.5.12

to see what the details are:

> module display cuda/6.5.12

to obtain bash shell access to compile a cuda program:

...

To view pdf docs from the command line , chose oneusing a xserver:
> evince /usr/local/cuda-6.5/doc/pdf/CUDA_C_Best_Practices_Guide.pdf

...

Where is the deviceQuery command? And how to find out more device info?

> which deviceQuery

> srun --pty --x11=first -p gpu deviceQuery

> srun --pty --x11=first -p gpu nvidia-smi -h

How does one make a gpu batch job?

Your compiled cuda program is submitted via a script that slurm's sbatch command can read. Use a text editor to create a file called, device_query.sh. Here we run the deviceQuery command:

#!/bin/bash #SBATCH --partition=gpu #SBATCH -c 2 #SBATCH --output=gpu.%N.%j.out #SBATCH --error=gpu.%N.%j.err module load cuda/6.5.12 deviceQuery > my_device_results.out

To submit this file to the gpu partition, run:

> sbatch device_query.sh

What gpu libraries are available on the cluster for linear algebra methods?
Cuda has support and additional support can be found in Cula routines. Cula addresses dense and sparse matrix related methods and access is via the module environment. To see what is current:
> module available
Cula install directory is /opt/shared/cula/

...

There are many Cuda programming resources on the web and of course the Nvidia Cuda website.

Stanford Seminars on High Performance Computing with CUDA
Stanford has posted videos from the Spring 2011 seminar series held at the Institute for Computational and Mathematical Engineering (ICME). The ICME is directed by Professor Margot Gerritsen.

Lecture 1: Intro to HPC with CUDA 1 (Cyril Zeller)
Lecture 2: Intro to HPC with CUDA 2 (Justin Luitjens)
Lecture 3: Optimizations 1 - Global Memory (Inderaj Bains)
Lecture 4: Optimizations 2 - Shared Memory (Steven Rennich)
Lecture 5: Finite Difference Stencils on Regular Grids (Paulius Micikevicius)

The following Stanford Univ. video lectures are available for viewing.

HPC

HPC & GPU Supercomputing Group of Boston
A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems.Prof. Lorena Barba’s research group at Boston University
She is a computational scientist and fluid dynamicist with research interests including GPU computing.

Look around the web as there are many similar GPU resources.

Version	Old Version 49	New Version Current
Changes made by	durwood.marshall	durwood.marshall
Saved on	Aug 04, 2015	Mar 09, 2016

Versions Compared

Key

GPU computing and CUDA resources

Page Comparison

Versions Compared

Key

GPU computing and CUDA resources