November 8, 2005
Software Fills in Missing data on Satellite Images
COLUMBUS, Ohio -- New software is helping scientists get a more complete view of the environment from satellites that orbit the earth.
Maps that depict the thickness of the ozone layer, for instance, frequently contain blank spots where a satellite wasn't able to record data on a particular day, explained Noel Cressie, professor of statistics and director of the Program in Spatial Statistics and Environmental Sciences (SSES) at Ohio State University.
He and his colleagues found a way to use data from the rest of a map as well as from previous days to fill in the blank spots. The same technique could be used in studies of agricultural data or even medical imaging.
When it comes to mapping the environment, satellites gather so much data every day that filling in the missing parts quickly is a challenge.
"Right now, from a statistical point of view, people can either fill in these maps well, but not very fast -- or fill them in fast but not very well," Cressie said. "We do it well and we do it fast."
By his estimate, if someone were to try to complete an ozone map in a way that was as statistically precise as possible, processing one day's worth of data could take 500 years.
The Ohio State software does the job in about three minutes.
It also calculates a measure of map precision. The varying precision in different parts of the completed map gives scientists valuable information about the quality of the data they use to construct computer models of Earth's climate system.
Cressie began this project with former student Hsin-Cheng Huang. Another student, Gardar Johannesson, completed the work for his doctoral dissertation at Ohio State.
In an upcoming issue of the journal Environmental and Ecological Statistics, Johannesson, Cressie and Huang cite NASA's need for new ways to process satellite data as the motivation for their work.
In fact, NASA already uses well-established techniques to fill in gaps in satellite imagery, but the task is becoming more difficult.
At any moment, NASA satellites are recording global ocean and air temperatures, wind speeds, and amounts of atmospheric molecules such as ozone.
The data come in at a rate of 1.5 terabytes (1.5 trillion bytes) a day. That's as much information as is contained in 1,500 copies of the Encyclopedia Britannica, or nearly 200 DVD movies. Future satellites will be able to gather even more data, much faster.
But as long as conditions like cloud cover or on-board electrical problems interfere with satellite instruments, missing data will always be an issue.
One way to fill in blank pixels on a satellite image is to use the average value of all the data points in nearby pixels. But averaging the data means losing potentially valuable details. Plus, there are important spatial relationships among the data that typical methods don't account for, in order to achieve faster computing. That means scientists can't appropriately measure the precision of the spatial data they are filling in when they use the normal methods.
"People have been developing methods to do this Ã± to fill in missing data and provide measures of how accurately they are doing so. But the methods often do not do well with massive amounts of data," said Johannesson, who now works at Lawrence Livermore National Laboratory. Huang is now with the Institute of Statistical Science at Academia Sinica in Taiwan .
The researchers developed statistical techniques that fill in missing data by performing calculations at different image resolutions. First, the software "zooms out" of the image to calculate potential values for the missing pieces at low resolution, then it zooms back in to refine the calculations at higher resolutions. Data from the surrounding pixels -- on that day and previous days -- all help determine the outcome.
The study details the application of the software to a month's worth of ozone data. To further test their methods, the statisticians also artificially removed the data from a slice of the ozone map above the Pacific Ocean for one day, and then used the software to calculate the missing piece. The results very closely matched the actual data.
Key to the technique is that it draws from a statistical method called Bayesian analysis to weight the available data in the calculations. Reliable data count the most; less reliable data count less, but they still count.
Cressie would like to see NASA and other organizations try out the software for ozone modeling or other applications. Any study that charts how some characteristic changes in space or time -- such as the health of crops in a field or the features in a medical image Ã± could benefit from using the methodology.
Until then, he's published some details of the technology as well as animations of ozone data on the SSES Program's Web site.
On the World Wide Web: