...
Insert excerpt |
---|
| Joshua Ainsley |
---|
| Joshua Ainsley |
---|
|
Our Tufts/HNRC research is focusing on Nutrigenomics to study gene-diet interactions in the area of cardiovascular diseases, utilizing both genetic epidemiology approaches as well as controlled dietary intervention studies. This research involves the investigation of nutrient-gene interactions in large and diverse populations around the world with long-standing collaborations with investigators in Europe, Asia, Australia and the United States. For the current project, I was using the cluster to deal with a large amount of genome data, such as genetic variants in human genomes, which can not be handled with my laptop computer. The cluster is over 50X faster than my laptop. It would not be possible to complete my research project without it!Anoop Kumar
Professor Lenore Cowen, Matt Menke, Noah Daniels and I used the cluster to hierarchically organize the protein structural domains into clusters based on geometric dissimilarity using the program Matt (http://bcb.cs.tufts.edu/mattweb/). The first step in the experiment was to align all the known protein domains using Matt. To compare all the 10,418 representative domains against each implied running Matt approximately 54 million times. While a single run takes only about 0.1 CPU seconds, running it 54 million times would take approximately 74 days on a single processor. By making use of the ability to run multiple jobs on separate nodes on the cluster we split the job into smaller batches of 0.5 million alignment operations per batch, thus creating 109 jobs that we submitted to the cluster. Each job took approximately 15 hours which is a significant reduction from 74 days. By running the jobs simultaneously on separate nodes of the research cluster we were able to reduce the time taken to perform our analysis from 2.5 months to less than a day. This speed up proved to be an additional benefit when we realized we needed to run an additional experiment using an alternative to Matt, as we were able to run that second experiment without significantly delaying our time to publication. This research has resulted in a paper, "Touring Protein Space with Matt", that has been accepted to the International Symposium on Bioinformatics Research and Applications (ISBRA 2010) and will be presented in May.
...