Machine Learning Software
Research Technology has made available tools for Machine Learning (ML) on the Tufts cluster. These are available through the module system which is describe below for each product. As indicated, where software packages use GPU technology, it will be necessary to use the GPU partition ( -p gpu).
R, Matlab, and SAS
These traditional statistics and computational applications are well suited for doing many aspects of ML. There are a number of textbooks and online documentation for using R, Matlab, and SAS to create ML analysis applications
To load R, Matlab, or SAS use
module load R/3.2.2
module load matlab
or
module load SAS
R, Matlab, and SAS require a build-your-own skillset.
TensorFlow
About TensorFlow (from tensorflow.org)
"TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well."
Using Tensorflow
Tensorflow is built on python and there are two modules which load Tensorflow with either python2.7 or python3.5. Both modules should be used with the gpu partition (-p gpu) for best performance To load Tensorflow, type
module load tensorflow/11-python2.7
or
module load tensorflow/11-python3.5
When using the python 3.5 version make sure to start python as python3. The Tensorflow site http://www.tensorflow.org has many examples and documentation for using this application.
Caffe
About Caffe ( from http://caffe.berkeleyvision.org/)
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
The Caffe site has documentation and examples for Caffe including tutorials.
Using Caffe
Loading the module for Caffe loads python/2.7.6 , cuda, and other support software. Caffe uses GPUs so specify the -p gpu partition to speed up the processing.
module load caffe
Scikit-Learn
About scikit-learn
scikit-learn is a collection of packages for machine learning built on NumPy, SciPy, and matplotlib. scikit-learn has tools for data-mining and data analysis. scikit-learn contains applications which perform Classification, Regression, Clustering, and Dimensionality reduction. scikit-learn has modules for preprocessing data and model selection.
See http://scikit-learn.org for documentation and examples.
Using scikit-learn
scikit-learn is installed as a library in python/3.5.0. Load the 3.5.0 version of python and run your python script. The scikit-learn library is called sklearn.
module load python/3.5.0
python3
Python 3.5.0 (default, Nov 4 2015, 11:43:11)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>>