My last two posts on LightSpace and More Like Us talked about the future direction for natural user interfaces (NUI), but there are plenty of examples of NUI products in the marketplace today, like Windows Phone 7, which launched this week. NUI is way more than just touch and gestures — context, location, speech and other technologies are all important elements that allow you to interact with your technology naturally depending on where you are, what device you’re using and what you’re trying to do. With Windows Phone 7, touch is cool, but we knew speech recognition would be key to furthering a more natural interface — not least for safety but also convenience. And at the end of the day, when you’re using a phone, speaking into the device is just natural.
With Windows Phone 7, people can use their voice for dialing, search and launching applications. You literally hold the start button and say what you want, whether it’s to find a business, to call a friend or to open an application like your calendar. You can say “Boxwood Café in London” and get a map, directions and the phone listing without all the clicks or typing.
I am currently using a pre-release version of Windows Phone 7 and had cause to use this feature recently when I thought I was going to be late to collect my parents from the airport here in Seattle. As I rushed to the car, I held the button and said “BA oh four nine” and Bing opened up with a real-time flight status of British Airways flight 049 from London to Seattle. I made it to the airport in good time. Less typing, more finding.
The use of speech recognition in Windows Phone 7 is natural and makes sense from a safety and usage standpoint. But how did it come to be? Although speech recognition technology has been around for a while in various forms, its application in Windows Phone 7 is the result of decisions we made about a year and a half ago when we formed a new group called Speech at Microsoft. This group combined our assets from Tellme and the Speech Components Group to focus on advancing NUI through speech recognition across Microsoft products.
One of the first things this group focused on was changing our approach to advancing speech innovation. In the past, speech recognition had been constrained in part because of the way it was architected. Speech is very similar to search — the more it is used, the better it gets. Microsoft’s Tellme speech platform handles over 11 billion voice requests a year with customers like Avis and Orbitz. It is the largest speech platform in the cloud (harnessing datacenter storage instead of relying on device storage alone) and is the most advanced speech learning platform. With volumes of search query data available, speech recognition is better, meaning it is closer to adapting to the way you really talk than requiring you to adapt to it. The results of this new approach are the capabilities we see today in Windows Phone 7, Ford Sync, the Kia Uvo, Bing for Mobile and Xbox Kinect. Already there are 2 million SYNC-equipped vehicles on the road, and with Windows Phone 7 now in market and Kinect set to follow, we’re looking forward to speech taking its natural place and ushering in the NUI era.