Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To understand how the cluster supports research at Tufts, the following user comments shows a wide range of applications. If you wish to contribute a short description of your usage, please contact durwood.marshall@tufts.edu or lionel.zupan@tufts.edu.

Anoop Kumar

Professor Lenore Cowen, Matt Menke, Noah Daniels and I used the cluster to hierarchically organize the protein structural domains into clusters based on geometric dissimilarity using the program Matt (http://bcb.cs.tufts.edu/mattweb/). The first step in the experiment was to align all the known protein domains using Matt. To compare all the 10,418 representative domains against each implied running Matt approximately 54 million times. While a single run takes only about 0.1 CPU seconds, but running it 54 million times would take approximately 74 days on a single processor. By making use of the ability to run multiple jobs on separate nodes on the cluster we split the job into smaller batches of 0.5 million alignment operations per batch, thus creating in 109 batches in total. Each batch took approximately 15 hours which is significant reduction from 74 days. By running the batches simultaneously on separate nodes of the research cluster we were able to reduce the time taken to run our job from 2.5 months to less than a day. The speed up not only helped us in running an additional experiment to compare our results against a competing program but also publish the outcomes sooner.

Recognizing the value of running large tasks on the research cluster and the future CPU intensive programming requirements of the group, Prof. Cowen has contributed additional nodes to the UIT research cluster. While members of the BCB research group (http://bcb.cs.tufts.edu/) get priority to run programs on those nodes anyone having account on the cluster can run programs on them.

Keith Noto

I'm a postdoc in the Computer Science department, working on anomaly detection in human fetal gene expression data. That is, how does one distinguish "normal" development (meaning: like what we've seen before) from "abnormal" (different from what we've seen before, in the right way) over hundreds of samples with tens of thousands of molecular measurements each, when we don't even really know what we're looking for? I use the Tufts UIT cluster to test our approaches to this problem on dozens of separate data sets. These computational experiments take thousands of CPU hours, so our work cannot be done on just a handful of machines.

...