The Tufts High Performance Compute (HPC) cluster delivers 35,845,920 cpu hours and 59,427,840 gpu hours of free compute time per year to the user community.

Teraflops: 60+ (60+ trillion floating point operations per second) cpu: 4000 cores gpu: 6784 cores Interconnect: 40GB low latency ethernet

For additional information, please contact Research Technology Services at tts-research@tufts.edu


Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

2. Bioinformatics services

a. Emboss and wEmboss:

Access to Emboss software is available on emboss.uit.tufts.edu , which provides both shell and web access. In both cases you will need an account. You may request an account at  http://research.uit.tufts.edu . The server hardware is a single quad core 64 bit host with 4 gig of ram.

For shell access to command line tools:
> ssh -Y emboss.uit.tufts.edu

For access to the web interface wEmboss.

For access to emboss web documentation.

Emboss tutorial

If you have any questions about Emboss related usage, applications, or assistance with software, please contact bio-support@tufts.edu.

b. Tufts Center for Neuroscience Research Genomics Core

The Tufts CNR Genomics Core supplies links to bioinformatics resources related to their operation. See Tufts CNR Genomics Core Resources for more information.

c. Genome Indexes on Cluster

Several mammalian genomes, indexes, and annotations are located on the Tufts HPC cluster.  Currently the genomes are listed below in the indicated directory tree are UCSC genome builds, except for canFam3 which is a NCBI build.

/cluster/tufts/genomes
  /HomoSapiens
    /hg18
    /hg19
  /MusMusculus
    /mm9
    /mm10
  /RattusNorvegicus
    /rn4
  /CanisFamiliaris
    /canFam2
    /canFam3

  Within each build subdirectory, there are two subdirectories.

  /Annotation
  /Sequence

In the Annotation directory there are subdirectories for gene annotations ( Gene), and depending upon the degree of annotation, directories for smallRNA and Variation. 

Under the Sequence directory, there are subdirectories containing indexes for popular short read sequence mapping programs.

  /AbundantSequences -- data files with over-represented sequences 
  /BlastDB  -- blast formatted genomic indexes: use genome.fa as reference name
  /Bowtie2Index  -- Bowtie2 formatted indexes: use genome as reference name
  /BowtieIndex  -- Bowtie formatted indexes: use genome as reference name
  /BWAIndex  -- BWA formatted indexes: use genome.fa as reference name
  /Chromosomes  -- individual chromosomes as fasta files
  /Transcriptome  -- Bowtie2 formatted index of transcriptome sites: use transcript as ref name
  /WholeGenomeFasta -- Genome as one file with accessory files

Please read the documentation for a mapping program to understand the way in which the reference indexes are referred.

Example: BWA

It helps to set up environmental variables to avoid having to type long paths. Here a set of short reads ( myreads.fq) are mapped to the mouse genome (mm10) with a SAM formatted file as output. Note that bwa uses genome.fa as a reference index name and the bwa mem analysis is used.

  module load bwa/0.7.9a
  
  export MM10=/cluster/tufts/genomes/MusMusculus/mm10/Sequence/BWAIndex
  export MYDATADIR=/cluster/shared/myutln/mmdata
  bwa mem $MM10/genome.fa $MYDATADIR/myreads.fq >$MYDATADIR/myreads.sam

Example: Bowtie2

Similarly, environmental variables can be set up, and in the case of bowtie2 BOWTIE2_INDEXES must be set also. Here we have an example of a paired end analysis, with minimal options. Note Bowtie2 uses genome as reference index name   (-x genome ).

  module load bowtie2
  export BOWTIE2_INDEXES=/cluster/tufts/genomes/MusMusculus/mm10/Sequence/Bowtie2Index
  export MYDATADIR=/cluster/shared/myutln/mmdata
  
  bowtie2 -q -x genome -1 $MYDATADIR/myreads_1.fq -2 $MYDATADIR/myreads_2.fq -S myreads.sam

d. HPC Modules for Bioinformatics

  To list the entirety of the module collection use this command

       module avail

To load a module use this command

       module load modulename/version

 as listed below. Default settings are annotated by '*'

       module list

shows currently loaded modules.

To unload a module use this command

       module unload modulename/version

 

 
ClassificationModule
Align/Mappingblast/2.2.24
blat/20140708
bowtie/0.12.7*
bowtie/1.0.1
bowtie/2.1.0
bowtie2/2.2.3*
bwa/0.7.9a
exonerate/2.2.0
Assemblypandaseq/2.5
velvet/1.0.19
velvet/1.2.03
velvet/1.2.10
BioVisualcytoscape/2.8.3
IGV/1.5.30
ChIP-SeqMACS/1.4.2-1
MAnorm/2014-04-03
General PurposeR/2.10.1
R/2.15.0*
R/2.15.2
R/2.15.3
R/3.01
R/3.0.2
R/3.0.3
R/3.1.0
mathematica/8.0
mathematica/8.04
mathematica/9.0.0
mathematica/9.0.1
matlab/2011b
matlab/2012a
matlab/2012b
matlab/2013a*
matlab/2014a
Microbial ecologyQIIME/1.5.0*
QIIME/1.6.0
QIIME/1.7.0
QIIME/1.8.0
mothur/1.25.1
mothur/1.29.1
ClassificationModule
NGSGATK/3.1-1
HTseq/0.5.4a*
HTseq/0.6.1p1
IGV/1.5.30
bedtools/2.17.0*
bedtools/2.19.1
bowtie/0.12.7*
bowtie/1.0.1
bowtie/2.1.0
bowtie/2.2.3*
bwa/0.7.9a
fastx/0.0.13
samtools/0.1.18*
samtools/0.1.19
Phylogeneticsmrbayes/3.1.2
RNAViennaRNA/2.1.6*
mirdeep2/2.0.0.5*
ranfold/2/0*
RNA-SeqSTAR/2.30e*
cufflinks/0.8.3
cufflinks/2.0.0
cuffinks/2.0.2*
cufflinks/2.1.1
misopy/0.5.2
rsem/1.2.4
tablemaker/2.1.1*
tophat/1/0/14
tophat/2.0.9*
tophat/2.0.10
Statistical Genetics
/GWAS
ancestrymap/6210
haploview/4.1
impute/2.0.3
mach/1..0.16
merlin/1.1.2
pbat/3.61
pedcheck/1.1
plink/1.06
  • No labels