Today, we see conversational AI all around us, from bots that we can chat with on messaging apps to voice assistants in our homes, cars and on our phones. As new hardware like in-home speakers become more prevalent, people are beginning to use these conversational agents for more and more tasks. We’ve moved beyond simple questions, like “What’s the weather going to be tomorrow” to expecting our agents to understand, remember and even make decisions to better serve us.

While the availability of interfaces has meant conversational agents are more accessible, there is still a great deal of work involved in making them more capable.

Learning Conversations

Consider a toddler, learning to speak. We don’t teach a young child in a structured, methodological way. They learn through exposure to conversation around them. A child’s first words might be Mamma or Dadda, because these individuals are likely to be around them most and play a key role in the child’s wellbeing. But what about the words beyond that? Children may learn the names of siblings or pets, or objects like a ball or a blanket.

As a child’s cognitive skills develop, their vocabulary explodes. They start to understand the ways in which words can be connected, building an understanding of the relationship between terms and phrases and gradually developing the ability to speak in and understand sentences.

At a truly individual level, we see learning occurring through exposure to data (conversations around them), experimentation (babbling and trying out new words) as well as rewards and incentives (parents praising or correcting language).

Teaching conversations to machines

Just as children need exposure to conversation to learn how to converse themselves, dialogue systems also need exposure to dialogue. In order to be effective, dialogue systems must not only understand, but actually master language. This requires systems to understand the many ways a point can be communicated, but also to be able to reason-upon and then generate a suitable point in response.

In my area of research, we have been exploring a two-staged process. In the first stage, given a dialogue history, we teach the system to be able to generate sentences that are suitable in context. For example, if our system is designed to give movie recommendations at my local theater, I may ask questions about movie genres, show times, ticket pricing and availability. The system must be able to understand each point of information and how it relates to the core task, for example if I say that I don’t really like horror movies, the system should understand this and use it to recommend me other options.

Dialogue systems also need to discern different possible sentences, depending on the task or objective. Some dialogue systems, particularly ‘chatbots’ like Microsoft’s Zo are designed mainly to engage and entertain the user. They seek to maximize engagement, learning to ask questions to keep the dialogue moving forward.

The process of language acquisition is learned by the dialogue system successfully completing tasks. Just as the child is guided and corrected, a dialogue system that is trained in a supervised environment learns to modify its behaviour. As the system is exposed to more and more sentences in conversation, it learns how to map and understand different terms and phrases. Over repeated attempts, the dialogue system becomes more capable at understanding the nuance of conversation and identifying the tasks or actions required regardless of the way these are communicated. Following observation of multiple sentence-response pairs, a dialogue system can learn to produce reasonable responses for any given context.

From listening to speaking

The second step of this research is to train dialogue systems to generate words to respond to users. These words can consist of single words, terms or phrases, and must be reasonable, but also need to be relevant in context of the goal.

This goal may focus on maximizing user engagement such as Zo does, helping a user to accomplish a task such as booking a vacation, or playing a game of 20-questions. The dialogue system is trained to perform this step through the application of reinforcement learning. Through repeated attempts, the system learns to plan its responses to best achieve the goal.

In the example of Zo seeking to maximizing user engagement, the system learns that asking questions results in longer dialogues compared to generating declarative responses.

Challenges for researchers

Children are able to learn through conversation with the people around them, but machines don’t have this luxury. We must provide the data and environments to help train and improve them. For dialogue systems, a major challenge is the need for a user simulator. Because the system generates new sentences compared to those observed in data, it is necessary to simulate the many different ways users could respond to generated sentences.

In short, to train diagloue systems… we need conversation! To overcome this challenge, researchers are developing techniques to help dialogue systems learn faster so that they can learn by communicating directly with humans.

At Maluuba, we are conducting research in comprehension and communication, with the vision of building a literate machine. We’re working towards a future where users can help to train and improve their AI assistants simply by talking with them.

Layla El Asri is the Research Manager at Maluuba, a Canadian AI company that’s teaching machines to think, reason and communicate with humans (acquired by Microsoft in January 2017). Based in the AI epicenter of Montréal, Maluuba applies deep learning techniques to solve complex problems in language understanding. Layla’s work explores artificial intelligence in the context of language understanding, dialogue and human-machine interaction.

To hear more from Layla, join RE•WORK at the Women in Machine Intelligence Dinner in Montreal this October 11, running in conjunction with the Deep Learning Summit on October 10 & 11. Register now for Early Bird passes - available until August 18th.