The Tufts High Performance Compute (HPC) cluster delivers 35,845,920 cpu hours and 59,427,840 gpu hours of free compute time per year to the user community.

Teraflops: 60+ (60+ trillion floating point operations per second) cpu: 4000 cores gpu: 6784 cores Interconnect: 40GB low latency ethernet

For additional information, please contact Research Technology Services at tts-research@tufts.edu


Machine and Deep Learning

Machine Learning Software

Research Technology has made available tools for Machine Learning (ML) on the Tufts cluster. These are available through the module system which is describe below for each product. As indicated, where software packages use GPU technology, it will be necessary to use the GPU partition ( -p gpu).

R, Matlab, and SAS

These traditional statistics and computational applications are well suited for doing many aspects of ML. There are a number of textbooks and online documentation for using R, Matlab, and SAS to create ML analysis applications

To load R, Matlab, or SAS use

module load R/3.2.2
module load matlab

or

module load SAS

R, Matlab, and SAS require a build-your-own skillset.

TensorFlow

About TensorFlow (from tensorflow.org)

"TensorFlow  is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well."

Using Tensorflow

Tensorflow is built on  python and there are two modules which load Tensorflow with either python2.7 or python3.5. Both modules should be used with the gpu partition (-p gpu) for best performance To load Tensorflow, type

module load tensorflow/11-python2.7

or

module load tensorflow/11-python3.5

When using the python 3.5 version make sure to start python as python3.  The Tensorflow site http://www.tensorflow.org has many examples and documentation for using this application.

Caffe

About Caffe ( from http://caffe.berkeleyvision.org/)

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

The Caffe site has documentation and examples for Caffe including tutorials.

Using Caffe

Loading the module for Caffe loads python/2.7.6 , cuda, and other support software. Caffe uses GPUs so specify the -p gpu partition to speed up the processing.

module load caffe

Scikit-Learn

About scikit-learn

scikit-learn is a collection of packages for machine learning built on NumPy, SciPy, and matplotlib. scikit-learn has tools for data-mining and data analysis. scikit-learn contains applications which perform Classification, Regression, Clustering, and Dimensionality reduction. scikit-learn has modules for preprocessing data and model selection.

See http://scikit-learn.org for documentation and examples.

Using scikit-learn

scikit-learn is  installed as a library in python/3.5.0. Load the 3.5.0 version of python and run your python script. The scikit-learn library is called sklearn.

module load python/3.5.0
python3
Python 3.5.0 (default, Nov  4 2015, 11:43:11)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>>

For additional information, please contact Research Technology Services at tts-research@tufts.edu