February 2, 2011
Researchers Develop New Framework For Analyzing Genetic Variants
A study from the 1000 Genomes Project yields data for analyzing structural variants in DNA
Advances in DNA sequencing technology have revolutionized biomedical research and taken us another step forward in personalized medicine. Now, scientists led by Brigham and Women's Hospital (BWH), Harvard Medical School (HMS), the Broad Institute, the Wellcome Trust Sanger Institute (WTSI), the University of Washington, and the European Molecular Biology Laboratory, have developed a new framework for analyzing key genetic variations that previously were overlooked. The research will be published in the February 3 issue of the prestigious journal Nature.
Identifying genetic differences between individuals previously concentrated on single-nucleotide polymorphisms (SNPs), single letter differences in a person's DNA, which could be informative about a person's disease or even his/her predisposition to a disease. However, more recently, it has been appreciated that each person's genome also carries an enormous amount of structural variation- deletions, duplications, insertions, and inversions in the genetic sequence.
"There are many structural variants in everyone's genomes and they are increasingly being associated with various aspects of human health" said Charles Lee, PhD, a clinical cytogeneticist at BWH and associate professor at HMS, and co-chair of this project. "It is important to be able to identify and comprehensively characterize these genetic variants using state-of-the-art DNA sequencing technologies."
The genetic sequences of 185 individuals were generated by the 1000 Genomes Project and comprehensively analyzed for structural variants by 57 scientists from 26 institutions. Scientists quickly realized that conventional methods for detecting SNPs could not be applied to the identification of SVs and 19 new computer programs and strategies had to be developed and tested to more accurately identify SVs. "The study found that no one program could comprehensively identify SVs and that each program had advantages and disadvantages that in some cases complemented other analytical programs," said Matthew Hurles, DPhil, of the Wellcome Trust Sanger Institute and co-chair of the project.
The study found a total of 22,025 deletions and 6,000 other structural variants. "We have been given our first glimpses of the complete spectrum of human genetic variation "“ from 1 bp indels to larger copy number changes," said Evan Eichler, PhD, a Howard Hughes Investigator at the University of Washington and co-chair of the project.
The study also provided important insights into how SVs are formed in the genome, thus linking SVs to mutational processes acting in the germline. "We found 51 hotspots where SVs, such as large deletions, appear to occur particularly often," said Jan Korbel, PhD, a senior author of this study from the European Molecular Biology Laboratory in Heidelberg, Germany. "Six of those hotspots are in regions known to be related to genetic conditions, such as Miller-Dieker syndrome, a congenital brain disease that may lead to infant death."
Data from this project are being made publically available to the scientific community through the 1000 Genomes Project, which aims to sequence the genomes of 2500 people by December 2012. The resource will be the largest collection of whole-genome DNA sequences freely available to researchers. The data may be accessed from the 1000 Genomes Project Data Coordination Center, a collaboration between the NIH National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI), at www.1000Genomes.org.
"Identifying SVs from DNA sequencing datasets is very challenging and it is gratifying to see the incredible progress that the SV group has made over the past 2 years," said Richard Durbin, PhD, of the Wellcome Trust Sanger Institute and co-chair of the 1000 Genomes Project. David Altshuler, MD, PhD, of the Broad Institute, also a co-chair of the 1000 Genomes Project, added, "I am confident that this map will serve as an important resource for future sequencing-based disease association studies."
On the Net: