Machine Learning Software
Research Technology has made available tools for Machine Learning (ML) on the Tufts cluster. These are available through the module system which is describe below for each product. As indicated, where software packages use GPU technology, it will be necessary to use te GPU partition ( -p gpu).
R and SAS
These traditional statistics applications are well suited for doing many aspects of ML. There are a number of textbooks and online documentation for using R and SAS to create ML analysis applications
To load either R or SAS use
module load R/3.2.2
or
module load SAS
Both R and SAS require a build-your-own skillset.
TensorFlow
About TensorFlow (from tensorflow.org)
"TensorFlow‚Ñ¢ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well."
Using Tensorflow
Tensorflow uses python and there are two modules which load Tensorflow with either python2.7 or python3.5. Both modules should be used with the gpu partition (-p gpu) for best performance To load tensorflow, type
module load tensorflow/11-python2.7
or
module load tensorflow/11-python3.5
The Tensorflow site http://www.tensorflow.org has many examples and documentation for using this application.
Caffe
About Caffee ( from http://caffe.berkeleyvision.org/)
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
The Caffe site has documentation and examples for Caffe including tutorials.
Using Caffe
Loading the module for Caffe loads python, cuda, and other support software. Caffe uses GPUs so specify the -p gpu partition to speed up the processing.
module load caffe
Scikit-Learn
about scikit-learn
scikit-learn is a collection of packages for machine learning built on NumPy, SciPy, and matplotlib. scikit-learn has tools for data-mining and data analysis. scikit-learn contains applications which perform Classification, Regression, Clustering, and Dimensionality reduction. scikit-learn has modules for preprocessing data and model selection.
See http://scikit-learn.org for documentation and examples.
Using scikit-learn
scikit-learn is installed as a library in python/2.7.6 and python/3.5.0. Load the version of python that you need and run your python script. The scikit-learn library is called sklearn.
module load python/3.5.0
python3
Python 3.5.0 (default, Nov 4 2015, 11:43:11)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>>