Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

2. Bioinformatics services

Anchor
bioinformatics
bioinformatics

...

Several mammalian and model system genomes, indexes, and annotations are located on the Tufts HPC cluster.  Currently the genomes are listed below in the indicated directory tree are UCSC genome builds, except for canFam3 which is a NCBI build.

/cluster/tufts/genomes
  /HomoSapiens
    /hg18
    /hg19
  /MusMusculus
    /mm9
    /mm10
  /RattusNorvegicus
    /rn4
/rn5
/CanisFamiliaris
/canFam2
/canFam3
/DrosophilaMelanogaster
/dm3
/CaenorhabditisElegans
/ce10 

  Within each build subdirectory, there are two subdirectories.

  /Annotation
  /Sequence

 

In the Annotation directory there are subdirectories for gene annotations ( Gene), and depending upon the degree of annotation, directories for smallRNA and Variation. 

Under the Sequence directory, there are subdirectories containing indexes for popular short read sequence mapping programs.

...

 /AbundantSequences -- data files with over-represented sequences 
  /BlastDB  -- blast formatted genomic indexes: use genome.fa as reference name
  /Bowtie2Index  -- Bowtie2 formatted indexes: use genome as reference name
  /BowtieIndex  -- Bowtie formatted indexes: use genome as reference name
  /BWAIndex  -- BWA formatted indexes: use genome.fa as reference name
  /Chromosomes  -- individual chromosomes as fasta files
  /Transcriptome  -- Bowtie2 formatted index of transcriptome sites: use transcript as ref name
  /WholeGenomeFasta -- Genome as one file with accessory files

 

Please read the documentation for a mapping program to understand the way in which the reference indexes are referred.

...

It helps to set up environmental variables to avoid having to type long paths. Here a set of short reads ( myreads.fq) are mapped to the mouse genome (mm10) with a SAM formatted file as output. Note that bwa uses genome.fa as a reference index name and the bwa mem analysis is used. See the BWA documentation for other ways to invoke bwa.

  module load bwa/0.7.9a
  
  export MM10=/cluster/tufts/genomes/MusMusculus/mm10/Sequence/BWAIndex
  export MYDATADIR=/cluster/shared/myutln/mmdata
  bwa mem $MM10/genome.fa $MYDATADIR/myreads.fq >$MYDATADIR/myreads.sam

 

Example: Bowtie2

Similarly, environmental variables can be set up, and in the case of bowtie2  a BOWTIE2_INDEXES variable must be set also. Here we have an example of a paired end analysis, with minimal options. See the bowtie2 documentation for a complete set of command options. Note Bowtie2 uses genome as reference index name   (-x genome ).

  module load bowtie2
  export BOWTIE2_INDEXES=/cluster/tufts/genomes/MusMusculus/mm10/Sequence/Bowtie2Index
  export MYDATADIR=/cluster/shared/myutln/mmdata
  
  bowtie2 -q -x genome -1 $MYDATADIR/myreads_1.fq -2 $MYDATADIR/myreads_2.fq -S $MYDATADIR/myreads.sam

 

 

  

d. HPC Modules for Bioinformatics

...