The voice is what breathes life into artificial intelligence. So says James Vlahos, programmer, journalist and writer, author of Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think. It is he who owns a unique experiment – a digital copy of his father dying of cancer. Using the tools that appeared in Facebook Messenger for creating chat bots and many hours of voice recorders with his father's stories, tales and reasoning, Vlahos, with the support of his father, actually achieved a kind of immortality for him. But today we will not talk about this project.
This interview with James focuses on the voice as the 'imaginative' aspect of technology that has featured in science fiction for years. And now, according to the writer, it is the voice that should change everything. Voice assistants can already speak and demonstrate individuality. And as technology advances, it will bring many questions that we have never encountered.
What really happens when we talk to someone like Alexa and she answers us?
If you talked to Siri or Alexa, said something and heard something in response, it seems to you that one process is happening. But in reality it is worth taking it as a collection of things, each of which is difficult to single out.
First of all, the sound waves of your voice must be converted to words, this is automatic speech recognition (ASR). Then the words must be translated by the computer so that it can understand their meaning, and this is already a natural language understanding (NLU, natural language understanding). If the meaning was somehow understood, the computer must say something in response, and this is a natural language generation (N LG, natural language generation). Once the answer has been formulated, speech synthesis occurs when words are taken in a computer and translated back into sound.
Each of these components is very complex. It's not that the computer just 'went into the dictionary to look at the word'. A computer needs to understand how people and the world function in order to be able to respond.
Are there any impressive advances in this area that interest you?
A lot of interesting work has been done in the field of natural language generation. Neural networks allow the computer to speak on its own. This is not just the use of certain prescribed words, this happens after training on huge arrays of human speech – subtitles for films, threads on Reddit and the like. Computers learn the styles of interaction between people, the types of utterances that person A can address to person B. To a certain extent, the computer began to get creative with the problem, and it caught my attention.
What is the ultimate goal? What will it look like when voice programming becomes ubiquitous?
The big opportunity lies in the fact that the computers and phones that we use now will lose their relevance in our life among technology, and computers will disappear in some way. You need information, you want something to be done, you just talk, and computers fulfill your request.
This is a big shift as we have always been tool makers and users. There is always what we take, hold, touch, swipe. And when we imagine that all this simply disappears, and all the capabilities of the computer turn out to be effective in their invisibility, since we are talking with small built-in microphones around, connected to the cloud, this is where this shift is felt.
Another change is related to the fact that we began to establish relationships with computers. People love telephones, but they don't take them as individuals. But we entered an era where we began to treat computers like creatures. To a certain extent, they express emotions, they have personality. They have their antipathies. We are looking for their company. This is something new that you could not expect from a toaster, microwave or smartphone.
Who can benefit the most from the rise of voice assistants? One of the groups of people we often hear about is the elderly, as their vision deteriorates and they can communicate more easily by voice. Who else?
Seniors and children are true focus groups for testing the possibilities of voice programming and artificial intelligence. Seniors have a problem of long-term loneliness, so they may want to, for example, chat with Alexa. There are applications where voice AI is used as a nurse, reminding you to take medication and allowing relatives to track it from a distance.
Without over-generalizing, remember that some older people develop dementia and find it harder for them to recognize that a computer is not really a living thing. It's the same with children, their connection to reality is not yet so strong, and they may be more willing to communicate with these personified AIs as if they were living beings. You can also see how voice AIs are used as virtual nannies – we are not at home, and the computer can look after the child. So far, this is not a reality in full, but to some extent, it seems, will soon become it.
What happens when we have virtual nannies and all that, and all technology fades into the background?
In a dark scenario, we will become less and less looking for human society, because we will have enough of our virtual friends. There will also be information leaks at Amazon when people need Alexa's company to chat.
But you can look at this from a positive side. The fact that we make cars more human is good. Whether we like it or not, we spend a lot of time in front of the computer. And if this interaction becomes more natural and less – about clicks and swipes, this will mean that we will become more real and human in comparison with how we are now turning into pseudo-machines, interacting with devices.
And I think we will have more centralized power over Big Technologies. Especially when it comes to something like searching the Internet. Less need to sit in the browser, search for the necessary information, synthesize it, open magazines, books, whatever. Instead, we can simply ask questions to our artificially intelligent voice oracles. This is really convenient, but it also means much more trust in companies like Google in telling us what is true and what is not.
How is this scenario different from the current, alarming, fake news and misinformation?
In the case of voice assistants, it is undesirable and impractical to offer you a voice analogue of a column of blue links in response to your question. And so Google has to choose which answer to give you. Now he has tremendous power, as he decides what information should be shown, and history has proven that if control over information is concentrated in one hands, it rarely ends well for a democracy.
There is a lot of talk about fake news now. In the case of voice assistants, we get a bias in the other direction. Google will have to be very fixated on not showing 'fake news'. If there is only one answer to show, it would be better if it weren't complete junk. I think it will be more about censorship. Why should they choose what is to be considered true?
How much should we worry about privacy and the types of information analysis that can be done using voice?
I'm just as worried about privacy issues as I am with smartphones in general. If tech companies can abuse access to my home, they can do so with my computer and with Alexa in the room.
This isn't about lessening privacy concerns. I think these fears are very, very realistic. But I'm sure it's not fair to single out voice assistants as the worst in this regard. Although the point is that we use them in different conditions, in the kitchen and in the living room.
Let's change the subject a bit. In your book some space is taken up by a discussion of the individualities of the various voice assistants. How important is it for companies that their products have personality?
Individuality is important. This is the key point, otherwise why do you need a voice at all? If pure efficiency is your thing, your phone or PC is the way to go. What is still missing is the difference between Cortana, Alexa and Siri. We don't see any efforts by tech companies to create vastly different personalities with ideas at the core targeting different parts of the market. They don't do what cable TV or Netflix do, which divide the consumer landscape into diverse segments.
I foresee that this will happen in the future. Now Google, Amazon and Apple just want more people to like them, so they don't target. But I think they will develop technology to the point that my assistant is not like yours or yours. I think they will do it because it might be attractive. The same thing happens with every product in our life – there is no one-size-fits-all solution, and I see no reason why this will not affect voice assistants.