December 16, 2011
New Tool Could Help Make Research More Efficient
A new tool may help researchers see patterns and uncover relationships in data that they may have not seen without it.
Researchers from Harvard University and the Broad Institute have developed a tool that can uncover patterns in large data sets in a way that no other software is able to do.
Data sets like statistics from a major league baseball season or Facebook profiles could take a person hundreds of years to analyze by eye, but sophisticated computer programs can search this data with great speed.
However, this software is unable to help researchers attempt to even-handedly detect different kinds of patterns in large collections of data.
"There are massive data sets that we want to explore, and within them, there may be many relationships that we want to understand," Broad Institute associate member Pardis Sabeti, senior author of the paper and an assistant professor at the Center for Systems Biology at Harvard University, said in a press release.
The team developed a tool called MINE, or Maximal Information-based Nonparametric Exploration, that can sift out multiple patterns hidden in data scenarios.
"The human eye is the best way to find these relationships, but these data sets are so vast that we can't do that. This toolkit gives us a way of mining the data to look for relationships."
The researchers tested their toolkit on several large data sets, asking MINE to make more than 22 million comparisons. The program narrowed in on a few hundred patterns of interest that had not been observed before.
"The goal of this statistic is to take data with a lot of different dimensions and many possible correlations and pick out the top ones," Michael Mitzenmacher, a senior author of the paper and professor of computer science at Harvard University, said in a press release. "We view this as an exploration tool — it can find patterns and rank them in an equitable way."
The tool is able to detect a wide range of patterns and characterize them according to a number of different parameters a researcher might be interested in.
Researchers currently use advanced technology to gather big, complex, data sets. Having a system that is able to depict patterns within these data sets and organize them could prove to be very useful in studies.
Other statistical tools are able to search for a specific patterns in a large data set, but they are unable to sort and compare different kinds of possible relationships.
Researchers are also able to use MINE to generate new ideas and connections. MINE can sort through multiple, recurring events or sets of data hidden in health information from around the world, or in the changing bacterial landscape of the gut.
"Standard methods will see one pattern as signal and others as noise," David Reshef, a co-first author of the paper, said in a press release. "There can potentially be a variety of different types of relationships in a given data set. What's exciting about our method is that it looks for any type of clear structure within the data, attempting to find all of them."
The researchers applied MINE to social, economic, health, and political data from the World Health Organization (WHO) and its partners.
The team compared the relationship between household income and female obesity and found two contrasting trends in the data.
In the data set, many countries follow a parabolic rate, with obesity rates rising with income but peaking and tapering off after income reaches a certain level.
However, in the Pacific Islands, where female obesity is a sign of status, countries follow a steep trend, with the rate of obesity climbing as income increases.
MINE is capable of identifying these complicated relationships in data sets that are driven by multiple drivers, Sabeti said.
Researchers will be able to use MINE to help generate new ideas and connections that no one has thought to look for before.
The research was published in the December 16 issue of the journal Science.
Image Caption: Brothers David Reshef and Yakir Reshef developed MIC under the guidance of professors from Harvard University and the Broad Institute. Credit: ChieYu Lin
On the Net: