The eyes may be the windows to the soul, but they can get deceptively clouded. The face is the proverbial red carpet that we roll out to greet others. It is purposeful, pleasant, and perfunctory, but the human face is also a jumble of contradictions.
On the one hand, people often keep a “stiff upper lip” to disguise how they’re truly feeling. On the other, more lucrative, hand, humans often emote freely and evocatively. It is this angle on which the tech world is banking, having spent over $20 billion dollars developing software that can read our every smirk and grimace. Blue chip companies believe that studying facial recognition patterns will provide them the key to customer satisfaction. Is our product being received well? How happy does it make consumers? Do smiles translate into dollars?
These questions may drive the private sector’s quest to be the best, but is the public responding with a collective poker face?
Put On a Brave Face
A recent study explored the correlation between human expression and the true feelings they convey. While there is an inevitable link between the two, our deeper emotions linger far below the surface, tucked away from the prying eyes that may be analyzing our face at the time. A simple smile may mask one’s sadness while a forced frown stifles a giggle just simmering behind a set of clenched teeth.
Adding to the complexity of the situation, the human race is a wildly varied subject, evolving with every nuanced interaction. For instance, cultural norms differ across the globe. A stoic expression in Southeast Asia might be seen as a pensive glare in Scandinavia. We are products of our environment and upbringing, and our facial expressions are conditioned over time to act as not only a first impression, but also a shield. We train ourselves to smile when it’s appropriate, stay stone-faced when the topic of conversation is solemn, and generally adhere to societal norms.
So how can Emotion AI crack this facial façade?
In a word: voice. Given the lost in translation paradigm laid out above, one might think that the various languages that exist in our world would make it more difficult to discern emotion from speech patterns. However, research indicates that vocal expression is more universal than the words being uttered. Even when a listener doesn’t understand a language being spoken, their mind can discern feelings like sadness, exhaustion, and anger just from the tone of voice.
The ultimate goal of artificial intelligence is to emulate the nuanced diagnostic power of the human brain and decipher these emotions in a matrix of mechanized learning. Emotion AI is at the forefront of this discipline, and it feeds off of such groundbreaking data.
Consider this fascinating experiment in the field of vocal enhancement. Subjects were asked to read a short story aloud. The recording of their performance was then altered digitally – sped up, slowed down, pitch-changed, etc. When a person listened to their own voice in one of these altered states, they perceived their own personal emotions differently each time. They reacted to a sad story when the recording was slowed down, for example.
This proves that we have the ability to manipulate emotions by altering speech. The two factors are indelibly linked, which is why voice recognition is proving to be an even more powerful tool than facial signifiers in the quest to engineer the best Emotion AI.
The Face/Feel Disconnect
When mapping an emotion AI landscape, developers must identify the expressions that equate to a given feeling. The scowl, for example, is traditionally associated with disdain or anger. According to the aforementioned study, however, subjects only scowled 30 percent of the time they were actually angry. That means that the basic input data on which AI systems rely to analyze human behavior is misleading at a rate of 70%.
This is unacceptable. There has to be a better system. Listen up…
Raise Your Voice (recognition software)
Since facial recognition has proven ill-equipped to handle the voluminous ways in which people express their emotions, voice detection must take up the slack. In fact, it’s more than just an addendum when it comes to harvesting emotional cues; voice recognition is actually a better indicator of true feelings. If not for any other reason…it is very hard to create fake intonations in your speech for an extended time.
One groundbreaking experiment tested subjects along a sliding scale of sights and sounds. While visual expression conveyed some emotional data, multi-sense communiqués offered more. Besting both methods, however, voice-only communication proved to be the most accurate mode to transmit honest human emotions.
Tell Us How You Feel
Building from these revelatory findings, the tech community is able to forge new pathways to a stronger Emotion AI paradigm. This requires more than just taking things at “face value” and going further by asking the how and why of the matter.
Humans have had millennia to contort their faces into whatever visage they choose to present the world. Maintaining a blank expression has become synonymous with strength and stability, but have we trained ourselves as thoroughly (or deceptively) to alter and suppress our voices?
Even when we lie, our speech patterns belie the truth quivering just beneath the words. The frequency of our voice reveals urgency; the volume speaks to our willingness or unwillingness to be heard. Halting vocal cues, repetition, unexpected modulations – these are the hallmarks of emotion voice AI detection.
When science truly listens to the way humans speak, it can crack the proverbial code of how to assess our behavior. Your face may indicate one emotion, but tell us how you really feel…