...
We use the cluster now for two main purposes: parallel text alignment
(aligning all of the words in a Latin or Greek text like the /Aeneid/ or
the /Odyssey/ with all of the words in its English translation) and
training probabilistic syntactic parsers on our treebank data. Both of
these are computationally expensive processes - even aligning 1M words
of Greek and English takes about 8 hours on a single-core desktop, and
for my end result, I need to do this 4 separate times. Using a
multithreaded version of the algorithm (to take advantage of each
cluster computer's 8 cores) has let me scale up the data to quantities
(5M words) that I simply could not have done on our existing desktop
computers. Most importantly, though, the cluster environment lets me
run multiple instances of these algorithms in parallel, which has
greatly helped in testing optimization parameters for both tasks, and
for the alignment task in particular lets me run those 4 alignments
simultaneously - essentially letting me work not just faster but more
accurately as well.
Luis Dorfmann
...