Author: Rob Dalgety, Peltarion

Natural language processing (NLP) is the ability to extract insights from and literally understand natural language within text, audio and images. Language and text hold huge insight, and that data is often prevalent and widespread in many organizations.

The ability to process language systematically, effectively and at scale lends itself to numerous applications across almost any organization with application to customer-facing products and services and customer support through to big process changes in the back office. NLP applications can apply to speech-to-text, text-to-speech, language translation, language classification and categorization, named entity recognition, language generation, automatic summarization, similarity assessment, language logic and consistency, and more.

Why deep learning for NLP? One word: BERT.

There have been a range of techniques applied to NLP, but deep learning, in particular, is showing some exceptional results. Deep learning has been applied to NLP for several years, and research and development breaks new ground so quickly that new methods and increasingly capable models are rapidly occurring. For example, FastText, an extension of Word2Vec, now significantly reduces the training load, which makes the application to a specific data set and language context much easier.

But a recent model open-sourced by Google in October 2018, BERT (Bidirectional Encoder Representations from Transformers, is now reshaping the NLP landscape. BERT is significantly more evolved in its understanding of word semantics given its context and has an ability to process large amounts of text and language. BERT also makes it easier to reuse a pretrained model (transfer learning) and then fine-tune your data and the specific language situation and problem you face.

Changing the NLP benchmarks

Against many measures, BERT performance has improved language processing by a significant amount. In some circumstances, BERT can be applied directly to the data and problem with no further training (zero-shot training) and deliver a high-performing model.

Since BERT was published last October, there has been a wave of transformer-based methods (GPT-2, XLNet, RoBERTa) which keep raising the bar by demonstrating better performance or easier training or some other specific benefit – for instance, text/language generation. The performance level of BERT is being likened as a moment in natural language processing akin to the “ImageNet 2012 moment,” where a deep learning model demonstrated a big uplift in performance and then led to a wave of image-based deep learning applications and research.

We know we are onto a good thing when the standard performance measures for language need to be radically changed and uplifted because the new methods simply outperform the old benchmarks! But while BERT is powerful and has reduced certain barriers for use, as with other deep learning approaches, it still needs to be fitted into a framework where the models can be more easily used for real-world circumstances and deployed by a wider group of organizations. This is a critical part of the deep learning adoption cycle. The true benefits of NLP will only be realized when there is broader adoption of these powerful models which operate in and improve live scenarios, supporting a wide range of applications across organizations and users.

Are you interested in experimenting with language processing via deep learning in a hands-on way?

Join Peltarion at the Deep Learning Summit in London this week.