How Do I Do That?

Q. I want to run a large number of R scripts submitted to the cluster, but there are too many to submit individually.

A. Several ways to do this. If they all use similar resources ( memory, cores, time) and aren't dependant on each other, a bash script to submit jobs works great. Here's an example. Note that this will work for other applications ( Matlab) that can process scripts. Just make appropriate changes.

First create a file of R script filenames that you want to run. Make sure each line has a line ending. These should be in the directory where you invoke the script.

Eg. myfiles.lst

RT_SET_1.R
RT_SET_2.R
RT_SET_3.R
RT_SET_4.R

Here is an example bash script to process the myfiles.lst into a series of submitted sbatch commands. You would want to edit the opts line to request the resources needed, and the outs line can be edited if needed. Leave the echo line in to see what happens, then remove the echo, leaving the sbatch command: sbatch $opts $outs --wrap='R --no-save < $filenm' .

Eg. runBatchR.sh

#!/bin/bash
module load R/3.2.2

opts="-p batch -c 8 --mem=10000 --time=10:00:00 --mail-type=ALL --mail-user=$USER"
while read filenm; do
outs="--output=$filenm.out --error=$filenm.err "
echo "sbatch $opts $outs --wrap='R --no-save < $filenm'"
sleep 1
done

The 'while read filenm; do' line iterates over each line passed to it, copying the line into the filenm variable.

To run this just use the cat function to pipe (pass) the file of R scripts to the bash script. Here is what would get submitted.

cat myfiles.lst | sh runBatchR.sh

sbatch -p batch -c 8 --mem=10000 --time=10:00:00 --output=RT_SET_1.R.out --error=RT_SET_1.R.err --mail-type=ALL --mail-user=dlapoi01 --wrap='R --no-save < RT_SET_1.R'
sbatch -p batch -c 8 --mem=10000 --time=10:00:00 --output=RT_SET_2.R.out --error=RT_SET_2.R.err --mail-type=ALL --mail-user=dlapoi01 --wrap='R --no-save < RT_SET_2.R'
sbatch -p batch -c 8 --mem=10000 --time=10:00:00 --output=RT_SET_3.R.out --error=RT_SET_3.R.err --mail-type=ALL --mail-user=dlapoi01 --wrap='R --no-save < RT_SET_3.R'
sbatch -p batch -c 8 --mem=10000 --time=10:00:00 --output=RT_SET_4.R.out --error=RT_SET_4.R.err --mail-type=ALL --mail-user=dlapoi01 --wrap='R --no-save < RT_SET_4.R'

Q. How do I install R packages locally?

A. If you want to load a package locally, R provides a simple way to do that. Make sure that you have enough room in your home directory to hold the package software. This simple method is good if you don't plan on loading a large number of packages or do not need versioning within a subrelease ( R/3.2.2 vs R/3.2.5).

If you don't have a directory named R in your home directory, this method will create one and place libraries in it down a long path. This is not a problem since R will use this path not you.

From within R you can do:
install.packages('package_name', repos="http://cran.r-project.org")

or

source("http://bioconductor.org/biocLite.R")
biocLite('package_name')

R will inform you that you can't write to the global directory, and ask if you want to use a personal library. Say yes. It will then give you a long path based off /home/username/R/long-arch-string/version and ask if you want to use this. Say yes. It will install the library and you're done!

If you do plan to load a large number of packages, or packages which require larger amounts of storage, you can use research storage on the cluster if you have space there ( request storage if you need it). Create a directory in your research storage, e.g. R-local , then link it from your home directory like this:

ln -s /cluster/tufts/<yoursharename>/R-local ~/R

If you have already loaded R packages in your home directory, Create a directory in your research storage, e.g. R-local , then copy the directory tree to R-local.

rsync -av ~/R/  /cluster/tufts/<yoursharename>/R-local    # make sure there is a / after the source directory
rm -rf ~/R        # remove the directory and contents careful!
ln -s /cluster/tufts/<yoursharename>/R-local ~/R

Q. How do I install python modules locally?

A. First, load the module for the python version ( e.g. 2.7) that you intend to use. That way the system managed module are loaded.

      module load python/2.7.6

Then create a local directory tree to store the modules ( ~/lib/python2.7/site-packages). Append a modified PYTHONPATH to your .bash_profile and source it. This is only needed for the first time that this is done.

    echo export PYTHONPATH="$PYTHONPATH:~/lib/python2.7/site-packages/" >> ~/.bash_profile
    source ~/.bash_profile

Now you can use pip or setup.py to load python modules locally.

    pip install --user <PACKAGE>

If there is a requirements file

    pip install --user -r requirements.txt

or after a python package is dowloaded and unpacked. Be sure to read the instructions for installing.

    python setup.py install --prefix=~

Tufts UIT Research Computing

How Do I Do That?

Analytics

Related content