International Consortium Uses ENCODE To Work Out DNA Role In Disease
Brett Smith for redOrbit.com – Your Universe Online
Partway through the 2003 sci-fi thriller ‘The Matrix Reloaded,’ we learn that the virtual reality world inhabited by society is criss-crossed with secret passages. Ordinary doorways suddenly became passages and pathways that made for limitless possibilities.
And like this infinite number of secret routes that fan out across the Matrix, a massive international team of scientists has discovered that the enormous amount of genetic material found in the back passages of the human genome—is actually an intricate control panel with millions of switches that turn our genes on and off.
The Encyclopedia of DNA Elements (ENCODE) is a five-year genetics project involving more than 440 researchers working in 32 labs around the world. The result of this team’s work, which highlights the functionality of what was previously thought to be ‘junk DNA’, was published this week in 30 scientific papers appearing in several different journals.
According to the researchers, the activity of these genetic switches is highest during early human development, when body tissues are most vulnerable.
“Our genome is simply alive with switches: millions of places that determine whether a gene is switched on or off,” says Ewan Birney of the European Bioinformatics Institute at the European Molecular Biology Laboratory and lead analysis coordinator for ENCODE. “The Human Genome Project showed that only 2 percent of the genome contains genes, the instructions to make proteins. With ENCODE, we can see that around 80 percent of the genome is actively doing something. We found that a much bigger part of the genome – a surprising amount, in fact – is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks.”
The ENCODE project is such a massive and revolutionary undertaking—it had to be published in a new and unique way that utilizes electronic documents and datasets. Instead of being published on static pages, the ENCODE content is connected through the different journals by ‘threads’ that allow readers and researchers to follow their area of interest all the way down to the original data.
“Until now, everyone’s been generating and publishing this data piecemeal and unintentionally trapping it in niche communities and static publications. How could anyone outside that community exploit that knowledge if they don’t know it’s there?” commented Roderic Guigo of the Centre de Regulació Genómica (CRG) in Barcelona, Spain. “We have now an interactive encyclopedia that everyone can refer to, and that will make a huge difference.”
Several major American universities played key roles in expanding the knowledge base around this section of the genome that does not directly code different proteins and traits.
A team from the University of Massachusetts led by professors Job Dekker and Zhiping Weng spent the past decade refining technology that allowed the ENCODE project to develop three-dimensional models that depict models of folded chromosomes.
Ross Hardison, a professor of biochemistry and molecular biology at Pennsylvania State University, said his school’s role in the project was finding the link between the genetic switches and certain diseases.
“Genome-wide association studies can map with high resolution the places on our genomes where variation in the DNA sequence among individual persons affects their likelihood of having diabetes, cardiac disease, any of a large number of autoimmune diseases such as Crohn’s disease, and other common diseases,” Hardison said.
“Because most of these genetic variations are not in regions of the DNA that contain the codes for producing proteins, scientists suspected that some of these non-coding regions might have an important role in controlling the expression of genes.”
Other institutions made major contributions as well. Yale researchers can now tell which activating genes came from ‘mom’ and which came from ‘dad’. Geneticists from the Broad Institute and MIT discovered how autoimmune diseases such as lupus and rheumatoid arthritis occupy regions that are active only in immune cells, while genetic variants associated with metabolic diseases sit in parts of the genetic code that are active in liver cells.
The University of California at Santa Cruz played one of the more comprehensive roles throughout the ENCODE project. When the ENCODE Project was first launched, a collective known as The GENCODE Consortium was established to score and mark these complex features across the human genome, by both manual genetic techniques and computational models. The UCSC role was to aggregate all of this data in one place.
“Our job was to gather data from 32 labs running different types of experiments on a staggering array of cells and tissues, and we had to establish a common data language so we could get it all into a single database that scientists across the world could use. We also developed a lot of new ways of looking at the data, creating search and visualization tools so that people could find the data most relevant to them,” said Jim Kent, director of the UCSC Genome Browser project and head of the ENCODE Data Coordination Center.
The researchers there operated the Data Coordination Center for ENCODE since an initial pilot project began in 2003 and have made the results of the project available through the school’s online genome browser, a graphical interface for displaying genomic data.
In cataloging the activity of 80 percent of the human genome, ENCODE identified more than 4 million regulatory regions where proteins specifically interrelate with the DNA. The result of this momentous undertaking is a significant advance in understanding how genetic information within the cell is expressed. Scientists said they expect that work performed by The ENCODE Project will allow for the designing of new strategies of disease treatment and prevention.
Despite the massive yield of information that the ENCODE project has already produced—work continues at the many institutions. At UMass, professor Weng received a four year, $8 million grant from the NIH to lead the Data Analysis Center of the project that will perform a comprehensive and integrative analysis of the data collected within ENCODE. Meanwhile, Weng’s colleague Dekker and his laboratory will expand their mapping of the 3D wiring of the entire genome. This includes studying the remaining 99 percent of the genome for which long-range relationships between genes and switches have yet to be understood.
Scientists at the Broad Institute and MIT said they will begin examining the genomic wiring of different cell types and how each one contains different epigenomic blueprints.
“We now have a map of the genomic locations of these switches, but we don’t have a map showing which switch controls which gene,” said Brad Bernstein, a senior associate member at the Broad Institute. “What turns on the switch? And when it turns on, what gene or genes get upregulated? Having a map of the way these elements are wired and connected is a critical goal.”