AI Online

Ai INNOVATION, SINCE 1895

Automotive Industries spoke to Craig Peddie, vice president and general manager, embedded speech, Nuance

This May, Nuance Communications, a leading provider of speech and imaging solutions, announced that the 2007 Mercedes-Benz C-Class, would incorporate Nuance’s speech solutions to give drivers new levels of safety and convenience when they use the car’s audio, navigation and communication systems.

The Burlington, Massachusetts-headquartered Nuance, is a leading provider of speech and imaging solutions for businesses and consumers around the world. “Our speech recognition and text-to-speech software deliver state-of-the-art performance and a rich set of features and tools tailored for the highly demanding automotive environment” says the company. Nuance’s automotive speech solutions have been successfully implemented in more than 5 million cars worldwide, representing more than 100 automobile models from more than 25 automobile brands from all major car manufacturers, including DaimlerChrysler, Fiat, Ford, Nissan and Renault, as well as quality Tier 1 suppliers, such as Aisin AW, Alpine, Bosch Blaupunkt, Bury, Denso, Magneti Marelli and Microsoft.

In the 2007 C-Class, its new in-vehicle control system has been christened the Command APS. It enables drivers to use their voice to select contacts from their phone address book, select music, or input addresses to the navigation system. DaimlerChrysler presented the in-vehicle navigation application and the voice-activated user interface as an option in the new C-Class at the Geneva Car Show 2007. The system uses Nuance’s VoCon®, the company’s embedded speech recognition software and Nuance’s RealSpeak®. While VoCon® allows the driver to control in-car functions in a natural and intuitive way, RealSpeak® text-to-speech software is used to convert text into remarkably high-quality speech for readout of incoming text messages or turn-by-turn directions.

“As traffic volumes continue to increase and car infotainment and communication systems become more sophisticated, speech technology provides the safest – and most convenient – interface to a car’s audio, navigation and communication systems. The Mercedes-Benz C-Class is one of the most comprehensive installations of voice features we have seen to date. The new model features a navigation device that allows drivers to enter their destination verbally — one of the most important features to increase safety and improve overall experience with a navigation system,” said Craig Peddie, vice president and general manager, embedded speech, Nuance.

With Command APS, drivers can formulate speech commands in many different ways without the need to use a small set of precise commands or to follow a multi-layer menu structure. For example, drivers can say, “Call Peter Miller” to initiate a phone call, eliminating the multiple voice commands required by previous speech systems. Drivers may also say, “Radio station WDR2,” to tune to a specific radio station without ever setting voice tags for specific channels. Command APS is fluent in US and UK-English, German, French, Italian, Spanish and Dutch, so drivers can select a language on-the-fly.

Some of the features of the Command APS powered by Nuance speech solutions include voice-activated destination entry where drivers simply say their destination address instead of having to type it on a touch-screen keypad. The US system even allows users to enter the street name first, which means that in a state like California, it has over 200,000 street names in its active vocabulary. The system also has voice-activated, hands-free calling so that drivers simply press the ‘Push-to-Talk’ button on the steering wheel and say the name of the person to be called. The system will then automatically connect with the names in mobile phone’s contact list without any pre-enrollment of the names required.

Automotive Industries spoke to Craig Peddie, vice president and general manager, embedded speech, Nuance.

AI: Where is speech technology in automotive today?

What we are seeing today is an increasing adoption of speech technology in automobiles. One impetus is the increasing safety concerns of governments worldwide trying to prevent driver distraction. Automobiles today are offering a plethora of features and functions that can distract the driver- Climate control, terrestrial radio, satellite radio, MP3 players (both built-in and accessories) and navigation devices. Trying to manipulate these devices while you are driving without taking your eyes off the road is virtually impossible. Speech makes this all possible. With speech a driver can change the temperature of the car, change the radio station, select a song or even enter a destination for a journey all without ever taking their eyes off the road. We are on the brink of speech becoming a truly mainstream feature in all automobiles, from the low end to the high end.

AI: What is the value of speech technology for car drivers and car makers?

For car drivers, the obvious value is safety and convenience. No longer do you have to fumble with knobs, stare at a radio display, fumble with an MP3 player or scroll around a virtual keyboard to control the environment surrounding you in the automobile. Speech makes it simple to perform all these functions. For the automobile manufacturer, speech is a high margin option they can now offer in their model lines, and one that is now seeing increasing user acceptance and understanding of the value of the option.

AI: Do you see speech technology going mainstream?

Definitely. Speech started out in high end automobiles several years ago, and as hardware and software costs come down, we are seeing an increasing availability of speech in lower end autos. For OEMs, the biggest barrier to greater adoption of speech by the consumer is better selling on the part of the dealer. Very little effort has been focused on training the dealer population on the benefits of speech, or offering incentives to them to focus on selling speech to their customers. As dealers begin to understand that selling speech is an easy and high profit option with their customers, we will see an increasing penetration of the end consumer market.

AI: What are the global aspects of the automotive industry that you have to address?

The biggest global aspect we have to address is the availability of language libraries. Development of a specific language libraries to support all the languages needs is probably the biggest impediment to world wide adoption. Developing language models is a very time consuming, labor intensive process for each language. We have a dedicated team of language collection specialists that cover the globe interviewing local native speakers and then capturing that spoken data. That data is then transported back to our engineering labs where it is processed for form the acoustical models that the speech engine uses to recognize a speakers voice. Develop the acoustic models required to release a new language simply takes time.

AI: What are the requirements from car manufacturers towards technology suppliers?

We are seeing that more and more car manufacturers are requiring speech as a component of the future infotainment/navigation platforms. Manufacturers themselves are also getting much more technology savvy when it comes to speech- in fact, we are starting to see a trend where the OEM is specifying the speech engine to be used to the teir1 suppliers Where this trend will continue in the future is something we are watching closely.

AI: What comes next?

Next will be a more natural speech interface to the automobile. OEMs are asking for, and we are showing them the ability for the user to request changes to the automobile in a natural, spoken manner. For example, instead of having to say “Tune Radio FM 104.5”, you will be able to say “Find a classic rock station”. If 104.5 happens to be your favorite classic rock station, then it would be tee’ds up… Being able to enter a destination in one complete utterance- “set destination 2400 market street, San Francisco, CA, 94013” can save a person several minutes

AI: What are the main challenges?

Speech is one of those technologies that works much better for some people than others. We must continue to improve our algorithms to increase the accuracy of our recognition, reduce the cost of integrating speech into the auto (e.g lower memory and processor requirements), and continue to offer new features such as open ended dictation. Our goal at Nuance is to make speech the most natural and ubiquitous user interface technology in the world, for all kinds of mobile devices.