Today we’re one step closer a world where anyone can communicate with anyone else. Microsoft has unveiled new technology capable of real-time (okay, almost real-time) translation from one language to another. Not only that, the program also maintains the speaker’s original voice (okay, almost their original voice). This is a stepping stone towards the development of a Star Trek-style universal translator.
There is a delay of a few seconds in the conversions, and though the voice does sound like the person talking, there is a definite mechanical-sounding component to the vocals. Check out this video of Chief Research Officer Rick Rashid demonstrating the breakthrough in Tianjin, China, near the end of last month. Jump to the seven-and-a-half minute mark to see it in action.
A number of recent advances in speech-recognition software have appeared in commercial products such as Apple’s Siri, Microsoft’s Kinect, and Dragon’s software that types what you say. Current programs like these typically make a mistake every four or five words. Using a neural networking system, and building upon what came before, this new technology stretches that out to a mistake every seven or eight words.
Not only does this improvement mean that you’re more likely to come up with an accurate, comprehensible translation, but being able to recreate the speaker’s voice adds another layer. The more human, less mechanical voice is more relatable, and maintaining the tone, tempo, and rhythm of the original statement, will increase the non-verbal factors that play into understanding speech from language to language.
This feat is accomplished by working with a “machine-learning algorithm” for a time. This provides a more nuanced baseline than simply reading off a page of text like many extant programs. Moving forward, this could help make communication easier, more effective, and help bridge language gaps.