MIT Geniuses Extract Audio From Potato Chip Bag And More
John Hopton for redOrbit.com – Your Universe Online
Scientists from the Massachusetts Institute of Technology (MIT) never fail to amaze, and now they have discovered how to identify sounds that have been made in a separate room purely from visual recordings of objects in that room, even if the video was taken through soundproof glass.
In partnership with Microsoft and Adobe researchers, the team has developed an algorithm that can reconstruct audio from videos of objects such as a bag of potato chips, aluminum foil, the surface of a glass of water and the leaves of a potted plant. The tiny vibrations in the surfaces of the objects, caused by sound waves, are invisible to the naked eye. But they can, with the right methodology and technology, be identified to enough of an extent to assess what sounds were made.
In a recent MIT statement, Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper, explained: “When sound hits an object, it causes the object to vibrate. The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”
Davis will attend Siggraph, the highly regarded computer graphics conference, to present the paper, which also involved Frédo Durand and Bill Freeman, both MIT professors of computer science and engineering; Neal Wadhwa, a graduate student in Freeman’s group; Gautham Mysore of Adobe Research; and Michael Rubinstein of Microsoft Research, who did his PhD with Freeman.
An absorbing video explains how the frequency of the visual samples – the number of frames of video captured per second – needs to be higher than the frequency of the audio signal, and that the best results come from using a high speed camera which captures up to 6000 frames per second. However, even video from a smartphone, which produces something like 60 frames per second, can provide enough visual information from an object to give a reproduction of recognizable and useful sound that occurred close to it.
A simple melody is used to demonstrate the technique (“Mary Had a Little Lamb” may be stuck in the heads of anyone who has watched the video), but recordings of voices are also presented. In terms of practical applications, criminal and other kinds of forensic investigation spring immediately to mind – the high speed camera can give identifiable words spoken and the smartphone alone can lead to knowledge of “the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.” A lot of information about the objects themselves can also be gained from the process in terms of structural properties and material behavior.
Davis explains that the team is “recovering sounds from objects. That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.”
He also says that one pleasing aspect of science is that a technique may be applied in ways that those who discovered it never dreamed of. “I’m sure there will be applications that nobody will expect. I think the hallmark of good science is when you do something just because it’s cool and then somebody turns around and uses it for something you never imagined. It’s really nice to have this type of creative stuff.”