We spoke with experts from Amazon Alexa, Heriot-Watt University, Google, Square and more, on their current work in the Conversational AI environment. 2021 is set to be a huge year in XAI, especially in the areas of Toxicity Identification, the evolution of Chatbots into Conversational AI Agents & Multimodal Conversational AI.  


  • Google rolling out comment toxicity reminders - 35% of those that see this then edit their post to lower toxicity levels
  • If we look at chatbots or CONVAI, 3-4 years ago and compare it to now, a lot of user expectations are hugely different, only a handful of years ago we had ‘clippy’ as a reference
  • Square have recently launched a virtual assistant which can aid in appointment booking with contact reminders and specific information retained
  • With the rise of deep learning, we have good NLP and CV models, which are now represented using embeddings, now for the first time we have a common framework for representing these two modalities

AI Principles and Identifying Toxicity in Online Conversation

Lucy Vasserman, Staff Software Engineer, Google

As a staff software engineer at Google, Lucy has been working on identifying toxicity in online conversations using artificial intelligence. This platform, called jigsaw, uses perspective servers to identify text from users which could be toxic. This then suggests an edit to the text before posting, which Lucy stated subtly caused 35% of Google users subjected to the notification to edit their message. Lucy suggested that building principled and ethical AI is a continuous, ongoing effort, not a one-time task, which, with the use of Machine Learning models, can mitigate bias. Whilst this is a huge stride forward, Lucy suggested that, even with huge developments, the human-centric nature of of these tasks is still very much a reality.

"We built perspective API which is an ML tool which identifies toxic comments in conversations and ranks them in regard to toxicity"
"AI Accountability is embedded throughout our work, avoiding the creation or reinforcing of unfair bias. I would love to say we balanced our data and solved our problems, that’s not true. Feedback is evolving all of the time. Our toxicity scores are some way off, however, there is still a lot of work to be done"
"Toxicity isn't just a problem from trolls, in fact, many having conversations can have bad days or moments which can lead to harmful or accidental toxic comments, which shows that it is very important to have humans there spot checking to help facilitate the awareness of mistakes"

See Lucy's full presentation below:

Science Behind Alexa TTS

Shubhi Tyagi, Applied Scientist, Amazon Alexa

As an NLP Applied Scientist at Amazon Alexa, Shubhi works on AI-based voice assistants, applying the latest Deep Learning advancements to make the interactions more human-like. Shubhi spoke about the Neural Network make-up of Amazon's models, alongside the processing of data through encoders and generation models.

"To give a better understanding of what goes on inside a TTS (text-to-speech). It can be divided into three sections. Text analysis, prosody generation and speech synthesis. During these stages, inputs go through extracting linguistic knowledge, then mathematical models of style and finally generation of acoustic signal"
"We have replaced or improved all components of TTS models using new deep neural network technologies. Texture speech can now be basically cut down into two NN models, one is Prosody Generation Model and the other Speech Production"
"We are also attaching style labels to our recordings to help our models work out which is which. Our Phonetic Encoder and Emphasis Encoder then feed into the attention and then context decoder for the spectrogram to have a universal Neural Encoder"

Multimodal Conversational AI Developments

Verena Rieser, Professor of AI, Heriot-Watt University

With recent progresses in Deep Learning, there has been an increased interest in multimodal conversational AI, which requires an AI agent to hold a meaningful conversation with humans in Natural Language about content in other modalities, e.g. pictures or videos. In this talk, Verena presented two case studies: one in generating responses for closed-domain task-based multimodal dialogue systems with applications in conversational multimodal search; and one case-study in selecting/ retrieving responses for open-domain multimodal systems with applications in visual dialogue and visual question answering.

Throughout the talk, Verena highlighted the open challenges, including context modelling, knowledge grounding, encoding history, multimodal fusion, evaluation techniques, and shortcomings of current datasets.

"What is Multimodal Conversational AI? It’s having a conversation with an agent or partner about things that are around you, visually grounded dialogue for example, where we get the AI system to talk about a picture, with an addition of a written image caption for their aid. The system then acts as an oracle"
"In order for a computer to understand what ‘dog’ means for example, we show them a picture of a dog - humans work multimodally and therefore can use this to build interfaces which are easier to use"
"With the rise of Deep Learning, we have good NLP and CV models, which are now represented using embeddings. For the first time we have a common framework for representing these two modalities. For experimental setups, we use open-domain response selection, posed as a ranking task. You have to distract the task by selecting the ground truth"
"We did however find that our sparse annotation phase didn't help us much. Why? Humans only in 11% of the data need to know what happened before in the dialogue"

Further reading - Experts Blogs & More

AI Experts Discuss The Possibility of Another AI Winter
Experts AI Advice for Starters
Female Pioneers in Computer Science You May Not Know - Part 2
AI Experts Talk Roadblocks in AI
Experts Predict The Next AI Hub
The AI Overview - 5 Influential Presentations Q3 2020
13 ‘Must-Read’ Papers from AI Experts
‘Must-Read’ AI Papers Suggested by Experts - Pt 2
10 Must-Read AI Books in 2020
10 Must-Read AI Books in 2020 - Part 2
Change Detection and ATR using Similarity Search in Satellites
Top AI Resources - Directory for Remote Learning
Top AI & Data Science Podcasts
30 Influential Women Advancing AI in 2019
30 Influential AI Presentations from 2019
AI Across the World: Top 10 Cities in AI 2020
Female Pioneers in Computer Science You May Not Know
Top Women in AI 2020 - Texas Edition
2020 University/College Rankings - Computer Science, Engineering & Technology
How Netflix uses AI to Predict Your Next Series Binge - 2020
Top 5 Technical AI Presentation Videos from January 2020
20 Free AI Courses & eBooks
5 Applications of GANs - Video Presentations You Need To See
250+ Directory of Influential Women Advancing AI in 2020
The Isolation Insight - Top 50 AI Articles, Papers & Videos from Q1
Reinforcement Learning 101 - Experts Explain
The 5 Most in Demand Programming Languages in 2020
Generative Models - Top Videos & New Papers
Applying AI in Clinical Development of Drugs
What is AI Assurance?
Experts Pick Their Dream AI Panel
Female Pioneers You Might Not Know