Does Google Duplex Scare You?

15 May

Does Google Duplex Scare You?

Just in case you have not listened to the telephone calls placed by Google’s latest conversational application code-named “Duplex” you should.

Google Duplex can place calls and make restaurant reservations or book appointments to hair salons. Nothing terribly exciting here. Speech recognition and spoken dialogue technology has had the capability of successfully completing such tasks for over a decade now. What is exciting and frankly also a bit scary is that for the few (probably cherry-picked) recordings made available at the Google AI blog, the robotic caller is practically indistinguishable from a human.

Google Duplex is not the first conversational system that passes the Turing test with flying colors for a few minutes: chatbots have been competing in this arena for years now. Mitsuku, the chatbot from PandoraBots, has been acing the Turing test for the past three years, passing as a human for a longer time, producing chats that are both more complex, more intelligent and more emotionally aware than Google Duplex. For details on the Loebner prize competition and past winners follow the link.

So where is the excitement in Duplex? Clearly not on the chatbot component of Duplex but rather on the dialogue component. Not on what the Google bot says (the generated chat) but on how it is being said (the generated speech). The intonation, pauses, hesitations etc. make the robotic caller sound very very much human.

Let’s do a deep dive into the technology. Is Google Duplex a quantum leap technologically? Not really. As far as the spoken dialogue technology, what we call discourse management and natural language generation, Google Duplex is state-of-the-art but does not represent a revolution. Spoken dialogue systems that let humans make reservations (and other transnational tasks) have been created in the research lab as early as the 1990’s and the industry since the turn of the century — and they are becoming (sometimes annoyingly) ubiquitous. As far as the speech generation or speech-to-text components, the Google researchers use standard tools that are tweaked for their purposes. What is really new here is the model that generates the intonational patterns fed into the generation component, i.e., how things are being said to make Duplex sound as human as possible. When everything works (both what is being said and how things are being said is ‘right’), the results are stunning. Here is a more polemic (but less technical) opinion from a Wired columnist.

So is Google Duplex just hype? Not true either. Google Duplex represents a quantum leap. The researchers at Google had the audacity to believe that the current state-of-the-art in AI can pass the Turing test for spoken dialogue systems. They got the data, trained the models and connected all the dots. Kudos!

So why does Google Duplex scare us? We are already experienced in talking and chatting with machines to get our customer service, receiving robotic calls for telemarketing and polling. We have also learned to live with conversational assistants in our home and soon in our car and in our office. The answer is simple: receiving a call from Google Duplex makes us feel ‘cheated’ and ‘manipulated’. Duplex has fooled us! Made us think that it is a fellow human, while in reality we are speaking with a machine. According to our commonsense moral code the machine is acting unethically and since machines don’t have morality, the creators are labeled as unethical. Venture Beat’s editor thinks so, and he is not alone.

Google marketing sensed the danger and a couple of days following the news splash about Duplex declared that from now on the robot will reveal its robotic identity. Especially in Europe with GDPR coming into effect ‘dishonest’ robotic callers could be especially damaging to Google’s reputation. Thus the immediate clarification from Google.

So now the ‘honest’ robot will no longer pretend to be human. Why should it sound human then? Why not sound something like this conceivably:

“Hello, my name is Google Duplex, I am an honest robot calling you to book an adult haircut appointment between ten and noon, this Thursday… with a preference towards earlier time… with Kelsey.”

And all the fun and magic is gone!

Or maybe putting some of the fun back with the ‘honest Trump’ robot that might sound something like:

“Hello, I am calling on behalf of John but I am really a robot that has been trained to sound like your president, Donald Trump. I want to have a haircut, it is gonna be a beautiful haircut, …the best haircut, … with Stormy … scrap that… with Kelsey, noonish tomorrow … ok?”

All in all, Google Duplex is a quantum leap in engineering, not technology. Google Duplex is mischievous but not deceitful. It might not be politically correct but it is not unethical. Let us not lose the forest for the trees here. To conclude, here are some important questions that we should be asking as technologists and potential users of Duplex:

Is there any advantage for Google Duplex to sound human? Do you get a lower reservation success rate from the ‘honest’ robot or the ‘mischievous’ robot?
Could a criminal potentially use this technology to harm people… other than hurt their pride or waste their time?
Does Google Duplex save time or waste time in the workplace? How much time is wasted in unsuccessful calls? Would Google Duplex get you banned from your favorite hair salon? 🙂
What is next? Can we create robots that can have human-like conversations for more interesting (non-trivial) tasks?
When can we expect to have emotional and behavioral intelligence in voice robots? See also our efforts at Behavioral Signals

This article appeared first on Medium. The editor is Alexandros Potamianos, CEO and co-founder of Behavioral Signals. Alex is a well-regarded innovator in the field of speech and natural language processing, interactive voice response systems and behavioral informatics. He has over 20 years of leadership experience both in the corporate and entrepreneurial sides of business. His background includes working at AT&T Labs-Research, Bell Labs and Lucent Technologies. His academic achievements have gone hand in hand with his extensive research work, receiving his M.Sc. and Ph.D. degrees in Engineering Sciences from Harvard University and later his MBA from the Stern School of Business, NYU. He’s a Professor at NTUA and currently living in Los Angeles where he enjoys kayaking and sailing.

Follow Alex on Twitter.

Name	Domain	Purpose	Expiry	Type
wpl_user_preference	behavioralsignals.com	WP GDPR Cookie Consent Preferences	1 year	HTTP
intercom-id-uha92v7r	behavioralsignals.com	---	9 months	---
intercom-session-uha92v7r	behavioralsignals.com	---	7 days	---
YSC	youtube.com	YouTube session cookie.	Session	HTTP

Name	Domain	Purpose	Expiry	Type
VISITOR_INFO1_LIVE	youtube.com	YouTube cookie.	Session	HTTP
GPS	youtube.com	Google advertising domain	Session	HTTP
__gads	wsj.com	Google advertising cookie set on the websites domain (unlike the other Google advertising cookies that are set on doubleclick.net domain). According to Google the cookie serves purposes such as measuring interactions with the ads on that domain and preventing the same ads from being shown to you too many times.	2 years	HTTP
TapAd_TS	tapad.com	TapAd advertising tracking cookie.	1 month	HTTP
TapAd_DID	tapad.com	TapAd advertising tracking cookie.	1 month	HTTP

Name	Domain	Purpose	Expiry	Type
_ga	behavioralsignals.com	Google Universal Analytics long-time unique user tracking identifier.	2 years	HTTP
_gid	behavioralsignals.com	Google Universal Analytics short-time unique user tracking identifier.	1 day	HTTP
_gat	behavioralsignals.com	Google Analytics tracking cookie.	Session	HTTP
IDE	doubleclick.net	Google advertising cookie used for user tracking and ad targeting purposes.	1 day	HTTP
vuid	vimeo.com	Vimeo tracking cookie	2 years	HTTP
uuid	live.streamtheworld.com	MediaMath tracking cookie.	1 year	HTTP

Name	Domain	Purpose	Expiry	Type
nabParticipation	behavioralsignals.com	---	4 months	---
nabExperimentsWithPageViews	behavioralsignals.com	---	4 months	---
AMCVS_CB68E4BA55144CAA0A4C98A5%40AdobeOrg	wsj.com	---	51 years	---
AMCV_CB68E4BA55144CAA0A4C98A5%40AdobeOrg	wsj.com	---	2 years	---
ub	wsod.com	---	2 months	---
f41	wsod.com	---	1 month	---
g61514	wsod.com	---	1 month	---
GED_PLAYLIST_ACTIVITY	video-api.wsj.com	---	51 years	---
_gat_customer	acast.com	---	Session	---
sc_anonymous_id	soundcloud.com	---	10 years	---
uuid-s	live.streamtheworld.com	---	1 month	---
tsUserData	targetspot.com	---	3 months	---
X-AB	sc-static.net	---	1 days	---
sp_t	spotify.com	---	2 months	---
sp_landing	open.spotify.com	---	1 days	---
_pin_unauth	open.spotify.com	---	1 year	---
_gat_gtag_UA_5784146_31	spotify.com	---	Session	---
personalization_id	twitter.com	---	2 years	---
sc_at	snapchat.com	---	1 year	---
TapAd_3WAY_SYNCS	tapad.com	---	2 months	---

Does Google Duplex Scare You?

How We’re Fighting Deepfakes – Live from Dubai

The Science Behind Voice Deepfakes: Biomarkers, Behaviors, and AI

Custom AI Solutions vs. Ready-to-Use AI. What to Choose for Your Business?

The Hidden Power of Emotion in TED Talks: What Makes a Speech Go Viral?

Leveraging Advanced AI for Customer Satisfaction: A New Era in Call Center Interactions

Advanced AI and the Art of Building Human Rapport in Call Centers

Voice Deepfakes: The Next Frontier in Cybersecurity

Top AI & ML Conferences You Can’t Miss, 2023 – 2024

Emotion and Behavioral AI technology via Voice

Legal Docs

We are social

Our partners

QUICK LINKS

Contact Us