Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

David Bamman and Greg Crane use the cluster now for two main purposes: parallel text alignment (aligning all of the words in a Latin or Greek text like the /Aeneid/ or the /Odyssey/ with all of the words in its English translation) and training probabilistic syntactic parsers on their treebank data. Both of these are computationally expensive processes - even aligning 1M words of Greek and English takes about 8 hours on a single-core desktop, and for their end result, they need to do this 4 separate times. Using a multi-threaded version of the algorithm (to take advantage of each cluster computer's 8 cores) has let them scale up the data to quantities (5M words) that they simply could not have done on their existing desktop computers. Most importantly, though, the cluster environment lets them run multiple instances of these algorithms in parallel, which has greatly helped in testing optimization parameters for both tasks, and for the alignment task in particular lets them run those 4 alignments simultaneously - essentially letting them work not just faster but more accurately as well.