...
Insert excerpt | ||||
---|---|---|---|---|
|
Anoop Kumar
Professor Lenore Cowen, Matt Menke, Noah Daniels and I used the cluster to hierarchically organize the protein structural domains into clusters based on geometric dissimilarity using the program Matt (http://bcb.cs.tufts.edu/mattweb/). The first step in the experiment was to align all the known protein domains using Matt. To compare all the 10,418 representative domains against each implied running Matt approximately 54 million times. While a single run takes only about 0.1 CPU seconds, running it 54 million times would take approximately 74 days on a single processor. By making use of the ability to run multiple jobs on separate nodes on the cluster we split the job into smaller batches of 0.5 million alignment operations per batch, thus creating 109 jobs that we submitted to the cluster. Each job took approximately 15 hours which is a significant reduction from 74 days. By running the jobs simultaneously on separate nodes of the research cluster we were able to reduce the time taken to perform our analysis from 2.5 months to less than a day. This speed up proved to be an additional benefit when we realized we needed to run an additional experiment using an alternative to Matt, as we were able to run that second experiment without significantly delaying our time to publication. This research has resulted in a paper, "Touring Protein Space with Matt", that has been accepted to the International Symposium on Bioinformatics Research and Applications (ISBRA 2010) and will be presented in May.
Recognizing the value of running large tasks on the research cluster and the future CPU intensive programming requirements of the group, Prof. Cowen has contributed additional nodes to the TTS research cluster. While members of the BCB research group (http://bcb.cs.tufts.edu/) get priority to run programs on those nodes anyone having account on the cluster can run programs on them.
Insert excerpt | ||||
---|---|---|---|---|
|
Keith Noto
I'm a postdoc in the Computer Science department, working on anomaly detection in human fetal gene expression data. That is, how does one distinguish "normal" development (meaning: like what we've seen before) from "abnormal" (different from what we've seen before, in the right way) over hundreds of samples with tens of thousands of molecular measurements each, when we don't even really know what we're looking for? I use the Tufts TTS cluster to test our approaches to this problem on dozens of separate data sets. These computational experiments take thousands of CPU hours, so our work cannot be done on just a handful of machines.
...