Whether you’re focused on digital assistants, chatbots or IVRs—or even managing voice talent—if you’re a player in the world of Voice, then SpeechTEK was the place to be this week. Taking place this year at the Renaissance Washington DC, SpeechTEK brings together hundreds of people each year to hear from experts in Voice technologies, and this year a majority of the focus was on Voice First and the evolution of skills for Speech-to-Speech® devices like Amazon Alexa and Google Home.
As these devices continue to improve—not only in natural language understanding but in capabilities and quality—one of the topics that came up several times was the impact of the Uncanny Valley on both the adoption and comfort-level of users of Voice First devices and other technologies.
If you’re unfamiliar with the Uncanny Valley, or not used to hearing about it in relation to Voice, you’re not alone. The Uncanny Valley refers to how humans react to artificial lifeforms as they get closer—but not quite perfect—to replicating true human behavior. Coined way back in 1970 by robotics professor Masahiro Mori, the Uncanny Valley refers to the dip in acceptance levels when charted out—that is, a rise of acceptance and then a sharp decline as the technology creates a feeling a creepiness—and is most often referenced in relation to robots, movies, and even video games which come eerily close to reproducing human behaviors and movement but don’t quite hit the mark. Movies like Disney’s 2009 “A Christmas Carol,” starring Jim Carrey, and Castle Rock’s “Polar Express” in 2004, found audiences reacting negatively to the animation due to the effect, but even more noted for the issue was “Mars Needs Moms,” which opened in 2011 and became the fourth-largest box office bomb in history, losing over $110 million.
In short, humans don’t respond well when technology feels creepy. Watching this robot move unnaturally & finally answer her creator that she will destroy humans should give you a good example of the Uncanny Valley in action.
So how does the Uncanny Valley tie in with Voice?
As the video above shows, Voice can be just as disconcerting as movement when dealing with an artificial intelligence. Both Amazon and Google are continually working to improve the quality of the native voices that are provided for their skills (Amazon’s 2018 Super Bowl commercial used that to hilarious effect), but as these voices get closer and closer to mimicking a human, the risk of making consumers uncomfortable will continue to rise.
Of course, part of the challenge is that digitally-rendered Voice struggles to be emotionally relevant to the situation. Whether Alexa is telling you the time, advising you that your account is overdrawn, or confirming that your lottery ticket is a winner, her tone of voice remains the same.
One solution to that challenge—often going unused—is to use real human audio in Alexa Skills and Google Actions. Both technologies support audio other than the native voice to be used for your skills and/or actions, and doing so can enable your skill to be relevant to the situation—for example, sounding compassionate when telling you about a negative financial situation, or excited by a winning PowerBall ticket.
Whether you’re looking to enhance the audio of your current skills/actions or need help developing them from scratch, SPLICE can help you develop better audio capabilities for Voice First devices while avoiding the Uncanny Valley. Just contact me at darin@splicesoftware.com or call us at 1-855-777-5423, and I’ll be happy to tell you more!