Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

Typical Cluster Usage at Tufts

Faculty, Research Staff and students use this resource in support of a variety of research projects.


To understand how the cluster supports research at Tufts, the following user comments shows show a wide range of applications. If you wish to contribute a short description of your cluster usage, please contact durwood.marshall@tufts.edu or lionel.zupan@tufts.edu.

Keith Noto

I'm a postdoc in the Computer Science department, working on anomaly detection in human fetal gene expression data. That is, how does one distinguish "normal" development (meaning: like what we've seen before) from "abnormal" (different from what we've seen before, in the right way) over hundreds of samples with tens of thousands of molecular measurements each, when we don't even really know what we're looking for? I use the Tufts UIT cluster to test our approaches to this problem on dozens of separate data sets. These computational experiments take thousands of CPU hours, so our work cannot be done on just a handful of machines.

Insert excerpt
Kyle Monahan
Kyle Monahan

Insert excerpt
Hui Yang
Hui Yang

Insert excerpt
Eliyar Asgarieh
Eliyar Asgarieh

Insert excerpt
Giovanni Widmer
Giovanni Widmer

Insert excerpt
Daniel Lobo
Daniel Lobo

Insert excerpt
Eric Kernfeld
Eric Kernfeld

Insert excerpt
Christopher Burke
Christopher Burke

Insert excerpt
Albert Tai
Albert Tai

Insert excerpt
Marco Sammon
Marco Sammon

Insert excerpt
Hongtao Yu
Hongtao Yu

Insert excerpt
Rebecca Batorsky
Rebecca Batorsky

Insert excerpt
Scott MacLachlan
Scott MacLachlan

Insert excerpt
Lakshmanan Iyer and Ron Lechan
Lakshmanan Iyer and Ron Lechan

Insert excerpt
Krzysztof Sliwa, Austin Napier, Anthony Mann and others
Krzysztof Sliwa, Austin Napier, Anthony Mann and others

Insert excerpt
Joshua Ainsley
Joshua Ainsley

Insert excerpt
Chao-Qiang Lai
Chao-Qiang Lai

Insert excerpt
Anoop Kumar
Anoop Kumar

Insert excerpt
Keith Noto
Keith Noto

Insert excerpt
Ken Olum, Jose Blanco-Pillado and Ben Shlaer
Ken Olum, Jose Blanco-Pillado and
I are using the cluster to attempt to solve an important question in cosmology, namely "How big are cosmic string loops?" Cosmic strings are ultra-thin fast moving filaments hypothesized to be winding throughout the universe, most of it in the form of long loops. There has been much theoretical interest and work in cosmic strings, but before we can connect the theory to future observations, we need to know the typical sizes of the loops the network produces.

It turns out this is an ideal question to solve numerically, since the evolution of each individual string segment is easy to compute, and the tremendous scales over which the network evolves makes analytic work extremely difficult.

What makes this exciting now is that the previous generation of numerical cosmic string simulations disagreed on what the right answer is. We believe that current hardware is sufficient to enable us to answer the question definitively.

Alireza Aghasi

The research that I am doing is very computational and requires a lot of processing and memory. I basically deal with Electrical Resistance Tomography (ERT), for detection of contaminants under the surface of the earth. The problem ends up being a very high dimensional Inverse problem which is intensively ill-posed. Dealing with such a problem without appropriate processing power is impossible. Once I became aware of the cluster I started exploring it and realized that some features of it really help me in the processing speed. The excellent feature which really interested me was the good performance in sparse matrix calculations. Star-P does an excellent job dealing with very large sparse systems compared with other platforms. Personally I experienced some very good results using Star-P.

Umma Rebbapragada

I am a Ph.D. student in computer science, studying machine learning. My research requires me to run experiments in which I test my methods on different data sets. For each data set, I may need to search for or test a particular set of input parameters. For each particular configuration of the experiment, I will need to perform multiple runs in order to ensure my results are statistically significant, or create different samplings of my data. In order to test a wide variety of configurations across multiple data sets, I exploit the cluster's ability to run "embarassingly parallel" jobs. I have submitted up to 2000 jobs at a time, and have them finish within hours. This has allowed me to test new ideas quickly, and accelerated my overall pace of research. I have different software demands depending on the project I'm working on. These include Java, shell, perl, Matlab, R, C and C++. Fortunately, these are all well-supported on the cluster. I also plan to explore MPI one day and take advantage of products like Star-P, which are available on the cluster.

David Bamman and Greg Crane

We use the cluster now for two main purposes: parallel text alignment (aligning all of the words in a Latin or Greek text like the /Aeneid/ or the /Odyssey/ with all of the words in its English translation) and training probabilistic syntactic parsers on our treebank data. Both of these are computationally expensive processes - even aligning 1M words of Greek and English takes about 8 hours on a single-core desktop, and for my end result, I need to do this 4 separate times. Using a multithreaded version of the algorithm (to take advantage of each cluster computer's 8 cores) has let me scale up the data to quantities (5M words) that I simply could not have done on our existing desktop computers. Most importantly, though, the cluster environment lets me run multiple instances of these algorithms in parallel, which has greatly helped in testing optimization parameters for both tasks, and for the alignment task in particular lets me run those 4 alignments simultaneously - essentially letting me work not just faster but more accurately as well.

Luis Dorfmann

Evaluating patient-specific Abdominal Aortic Aneurysm wall stress based on flow-induced loading

In this research we develop a physiologic wall stress analysis procedure by incorporating experimentally measured, non-uniform pressure loading in a patient-based finite element simulation. First, the distribution of wall pressure is measured in a patient-based lumen cast at a series of physiologically relevant steady flow rates. Then, using published equi-biaxial stress-deformation data from aneurysmal tissue samples, a nonlinear hyperelastic constitutive equation is used to describe the mechanical behavior of the aneurysm wall. The model accounts of the characteristic exponential stiffening due to the rapid engagement of nearly inextensible collagen fibers and assumes, as a first approximation, an isotropic behavior of the arterial wall. The results show a complex wall stress distribution with a localized maximum principal stress value of 660 kPa on the inner surface of the posterior surface of the aneurysm bulge, a considerably larger value than has generally been reported in calculations of wall stress under the assumption of uniform loading. This is potentially significant since the posterior wall has been suggested as a common site of rupture, and the aneurysmal tensile strength reported by other authors is of the same order of magnitude as the maximum stress value found here. The numerical simulations performed in this research required substantial computational resources and data storage facilities, which were very generously made available by Tufts University. This support is gratefully acknowledged.

Rachel Lomasky and Carla Brodley

We address problems in the two areas of Machine Learning and Classification. A new class of supervised learning processes called Active Class Selection(ACS) addresses the question: if one can collect n additional training instances, how should they be distributed with respect to class? Working with Chemistry's Walt Laboratory at Tufts University we train an artificial nose to discriminate vapors. We use Active Class Selection to choose which training data to generate. And In the area of Active Learning we are interested in the development of tools to determine which Active Learning methods will work best for the problem at hand. We
introduced an entropy-based measure, Average Pool Uncertainty, for assessing the online progress of active learning. The motivating problem of this research is the labeling of the Earth's surface to create a land cover classifier. We would like to determine when labeling more of the map will not contribute to an increase in accuracy. Both Active Class Selection and Active Learning are CPU-intensive. They require working with large datasets. Additionally, experiments are conducted with several methods, each with a large range of parameters. Without the cluster, my research would be so time-consuming to be impractical. For additional details see the Rachel Lomasky pdf attachment.

Eugene Morgan

The Tufts linux cluster allows me to work with large amounts of data within a reasonable time frame. I first used the cluster to interpolate sparse data points over a fairly large 3-dimensional space. The cluster has also dramatically sped up the calculation of semivariance for dozens of sections of seafloor containing vast numbers of data points, quickly performed thousands of Monte Carlo simulations, and computed statistics on one of the largest global wind speed datasets containing ~3.6 billion data points. I have most recently used the cluster find optimal parameters for rock physics equations using a genetic algorithm. Most of these activities have been or will be incorporated in technical publications.

Eric Thompson

We have used the Tufts Linux Cluster to further our understanding of the seismic response of near-surface soils. This behavior, often termed "site response," can often explain why locations heavily damaged by an earthquake are frequently observed adjacent to undamaged locations. Standard modeling procedures often fail to accurately model this behavior. The failure of these models is often attributed to the uncertainty of the soil properties. However, using the Tufts Linux Cluster we have shown that the underlying theoretical assumptions of the standard model (vertically incident plane SH-wave propagation through a laterally constant medium) are responsible for the failure to match the observed site response behavior.

Andrew Margules

The research that I am currently conducting is in the area of Passively Actuated Deformable Airfoils. The largest presence of airfoils today is contained within the aerospace and transportation industries. Like those on commercial and military aircraft, the basic teardrop airfoil shape is augmented with a series external structures which aid in take-off, landing, and cruising flight. While they perform specific and important functions, they add additional weight to a system which is highly immersed in weight management. What my research is looking into, is try find a way to develop an internal structure for an airfoil that would provide similar shape change, without the added external mechanisms. To do this, I am using two different computational software packages. COMSOL Multiphysics allows for the examination of the fluid-structure interaction of the airfoil and moving air. Using different internal rib structures, a goal of finding an appropriate structure is hoped to be achieved. In addition, I am using the computational fluid dynamics package Fluent to help visualize velocity and pressure fields over deformed and undeformed airfoil shapes. If this software was not available through the academic research cluster, this research would extremely slow process. The governing physics behind these simulations is complex enough that without the computing power of the cluster, I do not believe that we would be able to perform it. In the last twenty or so years, a focus has shifted from passive actuation to active actuation. Hopefully, this research will help to launch a renewed interested in this field.

Ke Betty Li

I am a researcher in the Department of Civil and Environmental Engineering. Our research focuses on the investigation of how various contaminants affect the ground water quality and how we could design remediation systems. An important approach we are using for this type of investigation is modeling contaminant fate and transport in the subsurface on computers. The resources provided by Tufts Cluster Center are very important to us. Our simulations usually take days or even weeks on a single CPU. The clusters can either expedite each simulation if we use simulators that enable parallel computing, or allow us to simulate multiple serial processes simultaneously. The significant improvement in computing efficiency is critical for us to commit work quality to funding sponsors. We expect that our work will improve the cuurent understanding of contamination in the subsurface, provide cutting-edge assessment tools, and stimulate innovative treatment technologies.

Eric Miller

Our work concerns the development of tomographic processing methods for environmental remediation problems. Specifically, we are interested in using electrical resistance tomography (ERT) to estimate the geometry of regions of the subsurface contaminated by chemicals such as TCE or PCE. Though the concept of ERT is not unlike the more familiar computed axial tomography (CAT) used for medical imaging, the physics of ERT are a bit more complicated thereby leading to computationally intensive methods for turning data into pictures. Luckily these computational issues are, at a high level, easily parallelizable. Thus, we have turned to Star-P as the tool of choice for the rapid synthesis of our algorithms.

Michael A. Simon

Nonlinear dynamic modeling of Lepidopteron mechanosensors

The Trimmer Lab is interested in the control of locomotion and other movements in soft bodied animals. I have been analyzing the activity of a specific mechanosensor trying to understand how it influences abdominal movement, a critical question for animals with no rigid components. One particularly powerful analytical tool for analyzing such sensors is nonlinear analysis using Gaussian white noise as a stimulus. One challenge of this technique, however, is that it is computationally complex. Even storing the matrices involved in these computations is beyond the capabilities of the typical personal computer. The Tufts Linux Research Cluster offers me the resources necessary to run these computations and analyze the results without needing to invest in new, complicated, or expensive analytical hardware or software. It also allows me to use software that would have been difficult to acquire for our lab, alone. Without this resource, following this line of inquiry would have proved a costly endeavor, possibly prohibitively so. We hope to apply our results to the development of computer and robotic models, with the eventual goal of designing a soft robot, a groundbreaking engineering application with substantial implications for design in the biomedical engineering arena, as well as in other areas of engineering.

Katherine L. Tucker

Use of the Bioinformatics cluster has been invaluable to our research. We use a genetic analysis software named SOLAR which is Linux/Unix based. This software and the methods used in it are cutting edge. We are able to perform varous genetic computations with ease. In the past some student have had to do these calculations by hand because of a lack of access to such software. However, hand calculations are only possible for small sample sizes and simple genetic analysis. Our current work with Solar includes over 5,000 individuals and we are using some of the most advanced methods available. The cluster allows us to do large computational runs that would not be otherwise possible. Thus, our current work would not have been able without access to SOLAR on the bioinformatics cluster. In addition, this type of analysis is being more common and will be a greater part of our efforts in future years. Use of the bioinformatics cluster helps our research to remain competitive and important in our grant application process. Our lab is the first to use SOLAR on the bioinformatics cluster, however, since we have been using it, many labs have inquired about how to gain access. I sincerely thank you for your work in helping us gain access to the software and the service you have provided through the Bioinformatics cluster.

Jeffery S. Jackson

I am a grad student in Mechanical Engineering and I am conducting research on microfluidic mixers. I use the Cluster01 to create and run fluid flow models on COMSOL Multiphysics. The COMSOL program solves the Navier Stokes equations for transient fluid flow and the convection diffusion equation. For the models that I create to be accurate, though, they require more elements and time steps than my computer, or the computers in the EPDC, can handle. This is where the cluster comes in very handy. I usually have the Cluster run any model that is more complicated than a 2D model with 30,000 elements. The most complicated model I have had the cluster solve consisted of 90,000 elements. This model took 30 hours for the Cluster to solve, which is something that no other computer resource I have access to could do. Another nice benefit of the Cluster is being able to use it from home. I live in Providence, RI and it takes me two hours to get to Tufts by train. So, I only come in when I have to. Having remote access to the Cluster makes this possible. Without the Cluster, or the very helpful people who provide excellent technical support, I would never have been able to do the research I needed to to finish my Master's Thesis.

Erin Munro

I'm studying Computational Neuroscience in the Math department. My research consists of doing MANY simulations. That being said, I would not be able to do this research without the cluster! I simulate networks of thousands of neurons interacting. While there are some simulations that take a few minutes, the majority of them take 45 minutes to an 1.5 hours on one node. The last time I calculated, I'd like to run over a month's worth of these simulations. On top of this, I've run several very important simulations that take 1.5 days on 16 nodes. I had to run these simulations in order to try to reproduce results from Roger Traub's research. My current project is to try to explain these results. We tried to find a simpler way to explain them without reproducing the full model, but we found that we couldn't do it. With the cluster, I have been able to reproduce the results to the best of my ability. Furthermore, I've been able to dissect the model, and run many more simulations to get a much better understanding of what is going on in his results. I feel like I'm coming close to fully explaining the results, and have just presented a talk at BU explaining my ideas. None of this would have been possible without the cluster.

Casey Foote

My research for my MS in Mechanical Engineering is based on using the software available on the cluster to model a cold forging process. This model, paired with experimental data, will then be used to develop a tool to predict forging work piece cracking. The tool will provide a manufacturer of airfoils for use in the aircraft engine industry a method to rapidly develop new processing while avoiding costly physical trials.

Aurelie Edwards

My graduate student Christopher Mooney performs simulations of unsteady, turbulent fluid flow in a bioreactor with a stir-bar, using Femlab engineering software. Prior to having access to the Tufts cluster, he was experiencing extensive memory usage problems. On a PC with 2GB of RAM using Windows XP, he was only able to
access about 40% of the memory, due to fragmentation issues, and his simulations did not converge. We were both relieved to learn that we could have access to the
Tufts cluster and its Linux platform that offers 4GB+ of memory space. The latter has thankfully allowed us to solve increasingly complex models. For example, using his PC, Chris could solve finite element Navier-Stokes fluid flow problems with an element mesh density that limited the problem to about 100,000 degrees of freedom, beyond which he ran out of memory. He often received "low mesh quality" error messages that hindered the mathematical convergence of the solution. On the cluster, he now has enough memory to refine the mesh and run models with 300,000 degrees of freedom. Chris still runs into "out of memory" problems on the cluster, but much less frequently. The technical staff at Femlab, when told of the kinds of problems we envision solving in the coming years, suggested using a server with 10 to 16GB of memory space to run these models with adequate mesh resolution. In other words, if you were to increase the capacity of the Tufts cluster, we would be takers!

Gabriel Wachman

I use the cluster to conduct experiments relating to my work in machine learning. I am in the computer science department. The experiments I have been running have generally been to aid in the comparison of different learning algorithms. By running many experiments over a range of parameters, I can collect data that helps me to draw conclusions on the behavior of the algorithms. Without the cluster, much of the work I have done would have been impossible or at best severely limited.

Alexandre B. Sousa

I am a grauate student with the High Energy Physics Group and as part of the MINOS experiment collaboration, I have been one of the main people responsible for mass event reconstruction using the Fermilab fixed-target farm. Earlier this year, a Mock Data Challenge was issued to the experiment in order to shake down reconstruction and analysis shortcomings before real data collection starts in January. This effort requested the generation of a rather large MonteCarlo sample, which was subsequently reconstructed at Fermilab. However, the generation of the MC sample was quite hard to setup at Fermilab, where space constraints, e-bureaucracy and competition with other experiments meant we would not be able to do it in a timely manner. That was when I decided to test the Tufts Linux Cluster to perform this task. I was setup with an area on the /cluster/shared space within a day of my original request, and after a few tests, I was able to generate 80% of the total necessary MC sample in less than a week. I was of course lucky to be almost the exclusive user of the cluster for that period, but I really had no problems setting things up and using it in what is seen as a nice success of the Tufts High energy Physics Group. Giving this success we have volunteered to become one of the spearheading institutions taking part on the upcoming MC generation effort which should start later this month, and the gained experience was transformed in a document and relayed to other institutions that are starting to run their own clusters and hope to join this effort. I have used the cluster a second time to do a customized reprocessing data for the CC nue analysis group, which I integrate, which required compilation in the cluster of the MINOS Offline Software, installation of a mysql database and assembling some shell scripts to handle the job output. That went quite well, and the full data sample was processed in 2 hours, with about 1 day of setup. Having worked for 2 years with the Fermilab batch farm, I was mainly impressed by the speed of the network connection of the CPU nodes to the I/O node, almost 20 times the Fermilab data transfer speeds and also by the great flexibility of use given to the users, which implied minimal
back and forth contact with the admins and dramatically improved work efficiency.

Ben Shlaer

Insert excerpt
Alireza Aghasi
Alireza Aghasi

Insert excerpt
Umma Rebbapragada
Umma Rebbapragada

Insert excerpt
David Bamman and Greg Crane
David Bamman and Greg Crane

Insert excerpt
Luis Dorfmann
Luis Dorfmann

Insert excerpt
Rachel Lomasky and Carla Brodley
Rachel Lomasky and Carla Brodley

Insert excerpt
Eugene Morgan
Eugene Morgan

Insert excerpt
Eric Thompson
Eric Thompson

Insert excerpt
Andrew Margules
Andrew Margules

Insert excerpt
Ke Betty Li
Ke Betty Li

Insert excerpt
Eric Miller
Eric Miller

Insert excerpt
Michael A. Simon
Michael A. Simon

Insert excerpt
Katherine L. Tucker
Katherine L. Tucker

Insert excerpt
Jeffery S. Jackson
Jeffery S. Jackson

Insert excerpt
Erin Munro
Erin Munro

Insert excerpt
Casey Foote
Casey Foote

Insert excerpt
Aurelie Edwards
Aurelie Edwards

Insert excerpt
Gabriel Wachman
Gabriel Wachman

Insert excerpt
Alexandre B. Sousa
Alexandre B. Sousa