...
Our Tufts/HNRC research is focusing on Nutrigenomics to study gene-diet interactions in the area of cardiovascular diseases, utilizing both genetic epidemiology approaches as well as controlled dietary intervention studies. This research involves the investigation of nutrient-gene interactions in large and diverse populations around the world with long-standing collaborations with investigators in Europe, Asia, Australia and the United States. For the current project, I was using the CLUSTER cluster to deal with a large amount of genome data, such as genetic variants in human genomes, which can not be handled with my laptop computer. The Cluster cluster is over 50X faster than my laptop. It would not be possible to complete my research project without CLUSTERit!
Anoop Kumar
Professor Lenore Cowen, Matt Menke, Noah Daniels and I used the cluster to hierarchically organize the protein structural domains into clusters based on geometric dissimilarity using the program Matt (http://bcb.cs.tufts.edu/mattweb/). The first step in the experiment was to align all the known protein domains using Matt. To compare all the 10,418 representative domains against each implied running Matt approximately 54 million times. While a single run takes only about 0.1 CPU seconds, but running it 54 million times would take approximately 74 days on a single processor. By making use of the ability to run multiple jobs on separate nodes on the cluster we split the job into smaller batches of 0.5 million alignment operations per batch, thus creating in 109 batches in total. Each batch took approximately 15 hours which is significant reduction from 74 days. By running the batches simultaneously on separate nodes of the research cluster we were able to reduce the time taken to run our job from 2.5 months to less than a day. The speed up not only helped us in running an additional experiment to compare our results against a competing program but also publish the outcomes sooner.
...