Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Insert excerpt
Luis Dorfmann
Luis Dorfmann

Insert excerpt
Rachel Lomasky and Carla Brodley
We address problems in the two areas of Machine Learning and Classification. A new class of supervised learning processes called Active Class Selection(ACS) addresses the question: if one can collect n additional training instances, how should they be distributed with respect to class? Working with Chemistry's Walt Laboratory at Tufts University we train an artificial nose to discriminate vapors. We use Active Class Selection to choose which training data to generate. And In the area of Active Learning we are interested in the development of tools to determine which Active Learning methods will work best for the problem at hand. We
introduced an entropy-based measure, Average Pool Uncertainty, for assessing the online progress of active learning. The motivating problem of this research is the labeling of the Earth's surface to create a land cover classifier. We would like to determine when labeling more of the map will not contribute to an increase in accuracy. Both Active Class Selection and Active Learning are CPU-intensive. They require working with large datasets. Additionally, experiments are conducted with several methods, each with a large range of parameters. Without the cluster, my research would be so time-consuming to be impractical. For additional details see the Rachel Lomasky pdf attachment.
Rachel Lomasky and Carla Brodley

Eugene Morgan

The Tufts linux cluster allows me to work with large amounts of data within a reasonable time frame. I first used the cluster to interpolate sparse data points over a fairly large 3-dimensional space. The cluster has also dramatically sped up the calculation of semivariance for dozens of sections of seafloor containing vast numbers of data points, quickly performed thousands of Monte Carlo simulations, and computed statistics on one of the largest global wind speed datasets containing ~3.6 billion data points. I have most recently used the cluster find optimal parameters for rock physics equations using a genetic algorithm. Most of these activities have been or will be incorporated in technical publications.

...