Visual Microphone Allows MIT Researchers To Extract Sound From Objects

By Joelle Renstrom | Published

MIT-Visual-Microphone MIT researchers have devised an algorithm that analyzes an object’s vibrations captured on video and recover the sounds that caused those tremors. Lets back up a minute. Sounds create vibrations. Those vibrations are usually so minute that we can’t see them, and in fact probably don’t realize they exist. But a high-speed camera can see them. The folks at MIT played music, which made an object vibrate. They then used an algorithm to essentially work backward from those vibrations and retrieve usable recordings of the sounds that caused them. In other words, by recording something like a plant or a bag of potato chips, they could harvest usable sound. This converts objects into what they call “visual microphones.”

In the demonstration in video below, the scientists play “Mary Had a Little Lamb” over a speaker near a houseplant. When you look at the plant, you can’t see any vibrations caused by the music, but their cameras do. During the experiments they used a high-speed camera that can record at 2,000-6,000 frames per second to 60-frame-per-second digital cameras. The important thing is that the frames-per-second is higher than the audio signal’s frequency. The effects when using a camera with a rolling shutter were even more dramatic. The better the camera, the better the audio reconstruction, but even the more rudimentary cameras captured enough information for listeners to discern the gender and number of speakers talking near the object.

They do something similar with a bag of chips on the ground, recovering from the bag actual human speech (the camera was outside, behind sound-proof glass). They recover music from headphones, and with the help of software can identify the song (“Under Pressure,” in case you were wondering). The team will present their work at the Siggraph conference, which focuses on computer graphics and interactive techniques.

It may not seem like there are many practical applications for such a discovery, but think of the spying potential. Forget about conventional bugs or even being a fly on the wall; a plastic bag or a bunch of flowers could hold audio information that people speakering thought was private. Thus, law enforcement and forensics might be particularly interested in this technology, although the researchers are excited about the project as a “new kind of imaging” that allows for in-depth analysis not just of the sounds recovered, but of the objects as well, and their responses to sound and other properties.