If you have any questions about Emboss related usage, applications, or assistance with software, please contact biotts-support@tuftsresearch@tufts.edu.
The Tufts CNR Genomics Core supplies links to bioinformatics resources related to their operation. See Tufts CNR Genomics Core Resources for more information.
Timings for Genome Mapping
The newest update to SLURM has better handling of backfill, which means if you specify a expected time for your program to run it can be placed earlier as nodes open up. Using sbatch you can specify a limit on the total run-time with -t or --time d-h:m:s. Times can be specified as min, min:sec, hr:min:sec, day-hr, day-hr:min, and day-hr:min:sec. So -t 5 means five minutes -t 5:00:00 is five hours.
Here are some run times (min sec) for BWA-mem, bowtie2, and samtools with 7, 15, 30, and 60 million fastq sequences. Samtools was used to convert SAM files and then sort the resulting BAM files. These runs were done with 8 cores and 16 GB memory. The timings were obtained using the time command ( e.g. time bwa mem -t 8 <hg38 ref> <sequences (human)> ) though some programs report the runtimes in the output. To effectively use the back fill, take note of how long your programs run and add a bit more time to give your programs some extra run time using the -t parameter.
#Sequences | BWA mem | Bowtie2 | Samtools |
7 M | 1' 29" | 1' 39" | 1' 52" |
15 M | 3' 8" | 2' 30" | 3' 57" |
30 M | 6' 36" | 4' 53" | 5' 38" |
60 M | 12' 32" | 10' 6" | 10' 18" |
c. Genome Indexes on Cluster
Several mammalian and model system genomes, indexes, and annotations are located on the Tufts HPC cluster. Currently the genomes are listed below in the indicated directory tree are UCSC genome builds, except for canFam3 which is a NCBI build.
Application performance is not always well documented and it may be beneficial to you to do some benchmarking. By doing so you will be in a position to better utilize the cluster resources. For example here is a benchmark examination of blastp and other tools.
e. Tufts Center for Neuroscience Research Genomics Core
The Tufts CNR Genomics Core supplies links to bioinformatics resources related to their operation. See Tufts CNR Genomics Core Resources for more information.