All of a sudden voice assistants are everywhere. In our phones, cars, TVs, microwaves and refrigerators.
If you don’t have at least one Amazon Echo, Google Home or Apple HomePod in your house at this point you might be in the minority: voice assistants have moved into our everyday lives in a big way, and they’re the new norm.
Google sold more than 6 million voice-assistant speakers just over the holidays, and Amazon was able to get tens of millions of speakers into homes within a single quarter. The race to be the dominant voice platform is underway, and at this point, anyone could win.
Given the rate of adoption, and the expansion of voice APIs for the masses, we thought it was time to look at the market, how it’s growing and where voice is headed next. 2018 is officially the year of the voice assistant.
The story of how voice suddenly accelerated in 2017 must first skip back almost fifty years first. IBM, in 1962, introduced the first voice-activated computer named the IBM Shoebox which could do incredible things:
Since that demo, humans have dreamed of interacting with their devices in a more natural way for decades, but it always felt a little far off. Science fiction, like Star Trek, 2001: A Space Odyssey and Back to the Future 2, gave us visions of the future where we’d interact with the digital world by just speaking aloud — but it always seemed like nothing more than a fantasy.
There have been various attempts at building rich voice experiences many, many times, and you likely recall those from the 1990’s best. Those tools required you to sit in front of a computer and dictate for hours to train it before use, and even then it remained unreliable at best.
The real innovations that pushed voice forward to where we are now aren’t entirely obvious: cloud computing, and machine learning. Neither ideas were particularly new, but the way they were embraced changed everything.
If you wanted to build a voice assistant in 1996, you’d need vast server rooms of your own to perform basic interpretation — which required massive amounts of investment. In 2018, it’s as easy as clicking a few buttons on Amazon Web Services and poof you’ve got a massive, high-performance data-center ready to go.
Cloud computing has revolutionized the way applications and ideas are built: before, you’d need at least some metal to run your voice service on, but now you can build a vast service without ever actually seeing a server.
Machine learning alongside cloud computing created a potent combination: suddenly developers had access to vast amounts of processing power to experiment with teaching a computer how to think — and we had larger data sets to feed them.
The theory behind machine learning has been around since at least the 1980’s. Dr Hermann Hauser, scientist and director of Amadeus Capital, said in a presentation that much of the ideas used by modern machine learning were invented decades ago, but the raw power wasn’t available to do anything with them.
Working off of these initial ideas, a Google engineer, Jeff Dean, used the company’s vast infrastructure to experiment with building the first at-scale neural network. Ultimately becoming Google Brain, it transformed industries as we know them. Suddenly, computers were able to grasp basic ideas, if fed enough information.
Equipped with an ability to grasp basic concepts, voice was inevitable for computers. Siri, which was released in 2014, was likely the first ‘modern’ voice experience consumers had — and while it was impressive, it was obvious that the technology was nowhere near usable on an everyday basis yet.
While Siri was a great early demonstration of what voice assistants could do, it was easy to stump it. Basic commands worked, but as soon as you asked it something unexpected — which happens as soon as humans feel comfortable — it would become stumped. Ultimately, the problem was that Siri wasn’t able to learn from its own mistakes until much later, in 2014.
It wasn’t until Amazon unveiled the Echo in 2014 that anyone started paying serious attention to voice again. It was by this point neural networks were beginning to find their way into consumer applications, and into the public eye — and it showed in the first reviews of Echo:
Echo wasn’t just impressive because it was the first device on the market that made voice feel really natural, but also because of its hardware: the company combined far-field microphones, a decent speaker and made it look good.
Far field microphones in 2015 were a concept not many people were familiar with. The technology allows a device to combine microphones to increase the range in which it’s able to hear a voice, and block out noises around them. Combined with audio processing improvements, it’s a potent technological leap: suddenly computers could hear and understand, almost anywhere in a room with a satisfying level of precision.
The Echo came out of nowhere, at least to the consumer, and a whole new model of interaction was born overnight because Amazon was able to stand at the crux of three massive innovations intersecting with one another — it also, conveniently, runs the world’s largest cloud computing platform.
Modern voice assistants became possible because their makers were able to offload that heavy data-crunching required for interpretation of voice to their cloud brains. All your smart speaker does is listen for the hot word OK Google or Hey Alexa, which opens the pipe to their online brains for real-time recognition.
Almost nothing is done locally by these devices, bringing prices down, and making them possible to build in attractive, fabric-coated form factors for your kitchen.
With these developments in mind, let’s look at where we are in 2018 from the consumer’s perspective: voice went from a cute tool, to a primary mode of interaction for the home. For the first time, people are comfortable — and even prefer — to use voice for interacting with digital devices.
This has been driven by aggressive competition between Google and Amazon. Echo was first to market, leaving Google reeling, and ultimately leading to the company investing billions in Home to build out what it sees as the next platform for search. If anything, Amazon Echo was the company’s first real existential threat, making Home all the more important.
As a result, we see a huge race to the bottom for voice, because it’s winner takes all.
What started out as Amazon Echo is now a multitude of products, including the smaller Echo Dot and the larger Echo premium speaker. Google has done the same, going down-market with Home Mini, and up-market with Home Max, which competes with Sonos and beyond. Apple is about to enter the game for the first time with the HomePod, which is set to ship in February.
Consumer Electronics Show was the first visceral evidence of how much this space is worth to those fighting for a spot on your bench:
All of the players in the voice space are pouring millions into it because, ultimately, they must. Google discounted Home Mini by more than half over the holidays, Amazon essentially gave Echo Dot away for free. For lower-end devices, they’re a gateway drug into the entire ecosystem: you’re almost guaranteed to expand later, so it’s not a big deal to sell at a loss.
If any one of these assistants ‘wins’ it means millions of people who will turn to that device, every day, before any other interaction model. These devices become the gateway to your home, as Internet of Things devices become prevalent, because they’re a natural way to interact with gadgets sans the need to pull out your phone.
They also vacuum up data at an unprecedented scale.
Google and Amazon are fighting over this space because it’s a fantastic, friendly vehicle for capturing data — the new gold. By becoming intimate with you to the point you turn to your voice assistant first, before your phone, these companies start getting closer to understanding your thoughts, and ultimately, your intent.
Almost everything you say to Alexa and Home is crunched, and stored, for later. That voice data is a goldmine for both companies because they’re able to use it both to train future algorithms, but also figure out how to get you to buy stuff.
Once you’re comfortable with voice, it gets even more interesting from there. The biggest advantage these devices have is they can make decisions on your behalf, while profiting from it, without your knowledge.
Here’s a theoretical example: imagine you’re planning to take an Uber to the office. When you ask Echo for a ‘ride to work’ it could, eventually, sell that term to the highest bidder and send whoever it feels like. Why would it default to Uber, if it’s not paying money? J
Just as Amazon did for the marketplace, thousands of brands will see their value diminished in a voice world, because assistants become the ultimate gatekeepers. Amazon, Google and Apple will decide who gets in front of you, and who doesn’t — and you probably won’t ever know.
Voice assistants are about to be everywhere. You probably have one sitting in the room you’re in now. But are we ready for this?
The biggest challenge in voice is one that the biggest players aren’t really talking about: privacy.
Both Amazon and Google store recordings of your voice as you use their devices, and both companies are able to decrypt those recordings to perform analysis, ultimately creating the world’s biggest voice database.
In our rush to voice assistants, we’ve forgotten the importance of privacy, and what having this data at scale means in the future. While all of these improvements have begun happening, it’s become near trivial to recreate someone’s entire voice using a computer and a handful of snippets. If that’s not terrifying, I don’t know what is.
There are additional privacy implications as well. Due to the nature of how your voice is processed: we’re wiring hundreds of pieces of metadata up to the cloud, like our bank accounts, to use them with Alexa and Home, without really considering it.
As developers have rushed to enable the next big consumer experience, they’ve fallen over themselves to get experiences in your hands.
Alexa, what’s my bank balance is a real command, available from multiple banks. It’s a legitimately useful use case for the user, but it’s also a great way for Amazon to figure out how much money you have on hand, and an even better way for an attacker to find out more information about your bank account.
This is great for Amazon, but presents a new problem in terms of privacy and security for end users. If a simple attack on iCloud accounts can wreak so much havoc on people’s lives, what happens if that voice database, and the accounts connected, are compromised? Perhaps our most intimate moments, on tape, would be exposed — and could reveal more than you might think:
The only major voice player to advertise itself as encrypting your voice, identity and any associated data is Apple. As with Siri on the iPhone, Apple advertises HomePod as a privacy-focused device:
In other words, Apple won’t know who you are, and won’t be able to do much more with that data once it’s left your home. That claim, however, doesn’t paint the complete picture: because Apple doesn’t process locally, your voiceprint is still in the cloud, and they could almost certainly link it back to you if they were forced to.
The practices Apple uses add a layer of security, but don’t solve the problem — your data, and voice, now live in a cloud somewhere. Eventually, if Apple wants to move beyond relying on a local iPhone to process integrations, it’ll need to associate that data somehow and likely backpedal those claims in order to provide a connected experience.
So, what about the competition? Amazon doesn’t detail what it does with Alexa, but Google, for its part, says it encrypts data, but it’s also the one holding the keys. As a result, we don’t really know how far that promise of ‘encryption’ truly extends:
Siri, which has improved in recent years, is clearly behind in the voice assistant race as a result of this data access: it’s still unable to infer basic human ways of interacting with information, such as saying “where is that?” after asking “What’s a great taco spot nearby?”
If you had told people just a few years ago that you were going to place an always-on microphone in their home, they’d have balked, and refused. Now, it’s increasingly common, and people don’t seem to be concerned about the impact of that on their privacy — but Apple’s bet is that they will.
What remains to be seen is if Apple’s bet on that privacy will matter. While Apple is just taking its first steps with HomePod, Amazon and Google are busy putting their assistants in everything from cars to microwaves.
Soon, every device around you might be listening. Are you ready for that?
Voice is the new interface, and isn’t going away anytime soon. For years, we’ve chased interacting with our computers in a more natural way, and the floodgates are open. So what next?
Privacy is the final frontier, and it’ll be a huge trend throughout 2018 relating to voice assistants. GDPR, the European Union’s biggest piece of new legislation in decades may drive that conversation forward, as it raises many questions about whether or not smart voice applications can be compatible with strong privacy law at all.
Over the coming year it’s likely the question of voice assistants, consent and voice security, will become a large part of the discussion. With GDPR, citizens of the EU will have the right to know where, and when their data is being used — as well as requiring their consent for expanded use of that stored data. It doesn’t matter if you’re building an experience from the US for EU customers: you’re still bound by the same rules.
Right now, most APIs for voice recognition are cloud-based, provided by Amazon and Google. This presents challenges for businesses looking to build experiences for their own apps with privacy in mind, especially with GDPR in the picture.
Local-only APIs, and on-premise solutions do exist, and may be worth considering as these concerns become even more important throughout 2018. Your customers may demand the peace of mind, and guaranteeing a level of predictable privacy is good business.
With Google, in particular, focusing almost all of its energy on voice as the next frontier for search, these questions are going to become more paramount. If we’re to imagine a future in which we’re talking to computers all day, like in the movie Her, we need to understand what happens with our voice once it leaves the room and goes online.
Its clear that voice is here to stay, and we’ll need to get comfortable with that reality for the foreseeable future. Privacy, especially when it comes to voice, is paramount, and the question really is wide open with consumer voice: where is the line?
With more than 50 million voice-enabled speakers expected to be shipped in 2018, and even more ambient smart devices, it’s an important question to ask before it’s too late.
Like this? For more about the future of voice at work, subscribe to our newsletter.