Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Recognizing the value of running large tasks on the research cluster and the future CPU intensive programming requirements of the group, Prof. Cowen has contributed additional nodes to the UIT TTS research cluster. While members of the BCB research group (http://bcb.cs.tufts.edu/) get priority to run programs on those nodes anyone having account on the cluster can run programs on them.

...

I'm a postdoc in the Computer Science department, working on anomaly detection in human fetal gene expression data. That is, how does one distinguish "normal" development (meaning: like what we've seen before) from "abnormal" (different from what we've seen before, in the right way) over hundreds of samples with tens of thousands of molecular measurements each, when we don't even really know what we're looking for? I use the Tufts UIT TTS cluster to test our approaches to this problem on dozens of separate data sets. These computational experiments take thousands of CPU hours, so our work cannot be done on just a handful of machines.

...

I am a Ph.D. student in computer science, studying machine learning. My research requires me to run experiments in which I test my methods on different data sets. For each data set, I may need to search for or test a particular set of input parameters. For each particular configuration of the experiment, I will need to perform multiple runs in order to ensure my results are statistically significant, or create different samplings of my data. In order to test a wide variety of configurations across multiple data sets, I exploit the cluster's ability to run "embarassingly embarrassingly parallel" jobs. I have submitted up to 2000 jobs at a time, and have them finish within hours. This has allowed me to test new ideas quickly, and accelerated my overall pace of research. I have different software demands depending on the project I'm working on. These include Java, shell, perl, Matlab, R, C and C++. Fortunately, these are all well-supported on the cluster. I also plan to explore MPI one day and take advantage of products like Star-P, which are available on the cluster.

...

We use the cluster now for two main purposes: parallel text alignment (aligning all of the words in a Latin or Greek text like the /Aeneid/ or the /Odyssey/ with all of the words in its English translation) and training probabilistic syntactic parsers on our treebank data. Both of these are computationally expensive processes - even aligning 1M words of Greek and English takes about 8 hours on a single-core desktop, and for my end result, I need to do this 4 separate times. Using a multithreaded multi-threaded version of the algorithm (to take advantage of each cluster computer's 8 cores) has let me scale up the data to quantities (5M words) that I simply could not have done on our existing desktop computers. Most importantly, though, the cluster environment lets me run multiple instances of these algorithms in parallel, which has greatly helped in testing optimization parameters for both tasks, and for the alignment task in particular lets me run those 4 alignments simultaneously - essentially letting me work not just faster but more accurately as well.

...