Speaking and conversing with computers has been in Science Fiction Novels, TV, and Movies for decades. Now, speech Recognition software seems to be everywhere in our lives from our mobile phones to our internet search engines. Although this isn’t a new idea, scientists and engineers have just recently gathered enough sophisticated voice technology to start implementing this progressive software into an increased number of mechanisms we use every day.
You may have used or noticed Google has added a microphone icon to the far right of each search box, which allows users to verbally state what they would like searched. This technology simply transcribes a recognized voice pattern into typed text for a simpler and hands free way to browse the internet. This was just Google’s first step to comfortably integrating this voice recognition technology on a much wider and more diverse scale.
Google has currently been developing Google Glass, which is a pair of glasses that can take photos and give digital directions with voice commands. A live demonstration of Google Glass at the 2013 MLB playoffs, allowed the operator to highlight architecture and players to pull up information and stats pertaining to the viewed or visually targeted topic. Microsoft is following the trend by developing their version of digitally informative eyewear.
Nuance has recently developed a speech recognition platform to be used with the creation of newly forming applications that will give programmers the ability to enable voice recognition into new projects. This will let the users operate, navigate, and confirm selections, within the application’s own boundaries, using nothing but their voice. For example, if a company has just developed a garage door opening application and used the available Nuance software, users will be able to open and activate the application by simply stating a verbal command and the garage door will open (no buttons pushed).
To fully understand the progression of this technology, let’s look back at the evolution of voice recognition software. It all started with the idea to create typed text documents using spoken language rather than pushing keys to form words. Companies such as Dragon pioneered a talk to type program that would allow users to create text documents without having to type. The program simply recorded and transcribed recognized speech patterns into words that would display in a word-type document. This technology was soon implemented widely in the 90s in the form of an automated telephone answering service that would guide callers through a couple directed questions until filtering the call to the proper phone or sales representative.
Now the rapid rise of powerful mobile devices is making voice interfaces even more useful and pervasive. Apple created Siri which can respond to questions and like Microsoft’s Voice Recognition Software, give directions and provide visual aids based off compatible key word searched material or an accompanying program such as Google Maps.
Steve Jobs had been working on a computer based Television platform that would allow users to operate and watch TV with recognized voice commands. Currently the Apple Corporation has undertaken this project to employ his idea. The media buzz speculates that Siri will be the controller, able to respond to the user with useful or practical data that pertains to the verbal inquiry or command, when the project is released to the general public.
Ford Automotive has also been experimenting with voice recognition technology, starting with a verbally controlled audio component, Sync. Currently, Sync can pull up directions, songs, and weather information, using particular voice commands to eliminate fidgeting with some common driving distractions. And like Siri, this program will actually respond with an automated verbal response. The target goal will be to have a fully functioning, hands-free center console that will recognize and respond to nearly all speech with appropriate actions.
After slowly tweaking, adjusting, and adding complex mathematical variations to read Arabic English voice commands, voice recognition software can now interpret almost any voice range accurately. With the advancement of computing power in increasingly smaller host devices, containing Speech recognition software is becoming more common place. Cell phones now have as much or more computing power than most desktops during the initial boom in the 80s and 90s.
In 2006, Canon released their first version of a Voice Operation Kit that users can attach to their already existing Canon imageRUNNER copy machines or printing stations. The Voice Operation Kit allows users to print, scan, copy, and fax projects using a couple tactile buttons combined with speech output and the AI’s verbal guidance. Utilizing this kit, users can choose to operate the machine solely on verbal communication or by combining the built-in touch control panels with speech. Either way, the Voice Recognition software will be beneficial to companies and offices by reducing the time spent trying to figure out how to perform a particular task or fix a dubious problem.
Pretty soon, voice guided interfaces will be controlling elements of our homes. Imagine being anywhere in your home and randomly asking a question out loud and having an AI respond to you with factual information. Until recently, the idea of holding a conversation with a computer seemed farfetched, but it is now becoming a more and more tangible reality.