Using Supercomputers To Speed Up Genome Analysis
February 20, 2014

Using Supercomputers To Speed Up Genome Analysis

Lee Rannals for - Your Universe Online

Researchers writing in the journal Bioinformatics say that genome analysis can be radically accelerated.

Over the years, the cost of sequencing an entire human genome has dropped, but analyzing three billion base pairs of genetic information from a single genome can take months. A team from the University of Chicago is reporting that one of the world’s fastest supercomputers is able to analyze 240 full genomes in about two days.

"This is a resource that can change patient management and, over time, add depth to our understanding of the genetic causes of risk and disease," study author Elizabeth McNally, the A. J. Carlson Professor of Medicine and Human Genetics and director of the Cardiovascular Genetics clinic at the University of Chicago Medicine, said in a statement.

Megan Puckelwartz, a graduate student in McNally's laboratory and the study’s first author, said the Beagle supercomputer based at Argonne National Laboratory is able to process many genomes simultaneously rather than one at a time.

"It converts whole genome sequencing, which has primarily been used as a research tool, into something that is immediately valuable for patient care,” Puckelwartz said in a statement.

Scientists have been working on exome sequencing, which focuses on just two percent or less of the genome that codes for proteins. About 86 percent of disease-causing mutations are located in this coding region, but still about 15 percent of significant mutations come from the other coding regions.

Researchers used raw sequencing data from 61 human genomes and analyzed the data on Beagle. They used publicly available software packages and a quarter of the computer’s total capacity, finding that a supercomputer environment helped with accuracy and speed.

"Improving analysis through both speed and accuracy reduces the price per genome," McNally said. "With this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is get the cost of analysis down into that range."

Ian Foster, director of the Computation Institute and Arthur Holly Compton Distinguished Service Professor of Computer Science, said the study demonstrates the benefits of dedicating a supercomputer resource to biomedical research.

"The methods developed here will be instrumental in relieving the data analysis bottleneck that researchers face as genetic sequencing grows cheaper and faster,” Foster said in a statement.

The team’s finding could have some immediate medical applications. For example, McNally’s Cardiovascular Genetics clinic would be able to use the new findings to look at multiple genes at a time in order to treat and prevent diseases.

"We start genetic testing with the patient, but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk,” McNally said. "In the early days we would test one to three genes. In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole genome."

The information collected through this method could help add to researchers’ knowledge about inherited disorders, helping to refine the classification of these disorders.

"By paying close attention to family members with genes that place then [sic] at increased risk, but who do not yet show signs of disease, we can investigate early phases of a disorder. In this setting, each patient is a big-data problem,” McNally said.