Does NLP have what it takes to fix voice & text recognition?
It’s no secret that machines are getting more and more intelligent. You can converse with an AI assistant online and not even realise that you’re not speaking with an actual human customer service employee. These virtual assistants are saving time, improving business efficiency, and optimising personal efficiency.
Until they’re not.
What happens when your assistant doesn’t understand you or answers the wrong question? And more importantly, why do these miscommunications happen and how are researchers working to overcome these obstacles in creating flawlessly automated personal assistants?
Speech tagging, semantics, sentiment analysis, question answering as well as general understanding of language and the way in which humans structure sentences and create sentiment is something that machines struggle to learn. These human traits can’t simply be programmed, they need training and the AI is required to learn almost like a human to convince us that we’re conversing with a human.
At the AI Assistant Summit in London last week, we heard from experts working to overcome these obstacles and heard from Google Assistant, the University of Cambridge, Facebook and MindMeld who shared their most recent breakthroughs.
Are AI Assistants a complete shift in paradigm? Yes!
Yariv Adan, Google Assistant
Let's rewind a second - how did people know things before googling? …’let me library that for you?’ doesn’t have quite the same ring - or efficiency. Google Assistant, and all other AI Assistants for that matter, are still in their infancy, and Yariv explained how the goal is for people to actually talk to the assistant, not just ask queries. Currently, there are too many technical aspects preventing this from becoming a reality. If we compare a spoken query to a text based search you immediately notice challenges - when you search ‘hello’ on Google, the first result that comes up is Adele’s song because it’s the most searched for result - when you speak to an AI Assistant, 'hello' can mean so much more and you're probably not after the song, so the complexity increases. The assistant needs to not only understand language but also intent! Yariv and his team combine almost perfect voice recognition, NLP, and context-driven responses, object recognition, personalisation and a better user interface to create a great assistant. The goal of the deep learning model they’re currently working with is for it to not only understand what has been said, ‘but to work out what you meant to say if you mixed up your sentence, and correct your query, answering the question you meant to ask.’
But still, in the area of NLP deep learning needs to be more efficient. At the University of Cambridge, researchers are using data to drive language understanding to help overcome the short-fallings leading to inefficiencies. Now that personal assistants are a part of everyday life, there’s a huge amount of research moving into industry application. Nikola Mrkisc explains that ‘speech recognition has got really big’, but as Yariv stressed, it’s important for the AI to understand what the users are saying. At the moment, it can ‘write down what you’ve said, but not action that correctly.’ Nikola discussed with the attendees how companies are trying to solve this problem by employing countless engineers to add hand crafted rules to deal with different user actions. Unfortunately however, this makes the models over-complex and non transferable between different languages and domains.
Recent advances in deep learning however, ‘have allowed us to build data-driven models which overcome most limitations of current rule-based understanding models’. Most recently, Nikola has been working on the attract-repel algorithm which enforces synonyms and antonyms to form relationships between words using fine-grained, context-sensitive vector updates.
‘For example, ‘irritating’ and ‘annoyed’ carry similar context are pushed together (attract-repel method) and are given the same sentiment.’
Additionally, the model improves on traditional dialogue state tracking and distribution models such as word2vec by combining multiple languages allowing them to operate multilingually over different domains simultaneously. This algorithm is completely leverage-able and requires no handcrafted rules and the languages which are combined together in the word vectors have different semantics, belonging to different language families - ‘you can create these vector spaces for 95% of global languages’- and are able to operate under the same algorithm.
It’s not just voice assistants that have problems understanding intent. Facebook’s platform for text understanding is optimised to suggest new experiences like social recommendations and Marketplace suggestions based on the language user’s use on Facebook. Davide Testuggine, who works on Facebook’s DeepText project explained how they’re using a single platform for all natural language understanding tasks specific to Facebook products. Their deep learning model works with text and work classification, content similarity and entity resolution by running the DeepText platform in the browser using similar representations across multiple tasks, languages and architectures.
Labelling the text requires additional deep learning capabilities and Davide explained how they’re using CLUE, a client for DeepText to help with the cycle of collecting and labelling data, training the classifier and reviewing the information before it’s fed into DeepText.
When language is being processed, named-entity recognition (NER) locates and classifies entities into categories such as people, organisations, locations, percentages, quantities etc. There are countless ways of expressing the same sentiment and the machine needs to be able to draw similarities between ‘John bought twelve apples in London yesterday’ and ‘My mate John grabbed us a dozen braeburns in the big smoke yesterday’. Vijay Ramakrishnan followed on from Davide's presentation, by explaining the similar system they're using at MindMeld. The team have built and benchmarked deep learning models that achieve state of the art results on a public dataset using glove and word2vec on twitter data to categorise data automatically. Through GPU optimisation, they were able to train the model on only 3000 manually tagged examples into 10 main categories to teach the machines.
Consumers demand more personalization and customization in their lives. Personal assistants that cater for a particular brand and products catered specific to their likings will increasingly be expected from the consumer. Business have to serve such expectations and manually doing this will stop being scalable after a certain point. This is where AI can help alleviate the human work aspect by serving some/all of these experiences in the future. This is why virtual assistants will become increasingly relevant to serve brand experiences.
AI Assistants are clearly here to stay, and with some of the huge tech giants leading the way with NLP and DL algorithms improving constantly, it won't be long before Yariv's ideal scenario of user's chatting with their AI Assistants is a reality.
We'll once again be joined by leading minds at the AI Assistant Summit in San Franciso next January 25 & 26. Cross industry experts from Facebook, Autodesk, Woebot Labs, Senstone, AdmitHub and many more will be sharing their most cutting edge research.