How Twitter is using NLP to Identify Hateful Content

Social media platforms are very good at knowing what is trending on their platforms, but they are not always so good at knowing why.

This is because individual social media posts, even if only containing text, can be hard to understand out of context, even for humans.

At the RE•WORK Deep Learning Summit in San Francisco, 2022, Twitter's Machine Learning Researcher Omar Florez, and Snap Inc.'s Principal Research Scientist Leonardo Neves took the stage to discuss their work pre-training self-supervised NLP models designed to better understand the context behind the content on their platforms.

“If you're doing NLP on books, you have pages of context. If you're doing it in Wikipedia articles, or news articles you have a couple of paragraphs,” says Nevez. “But if you're working on social platforms [you get] very short informal text.”

However, the sheer scale of a platform like Twitter, which regularly sees 500 million tweets a day, provides a powerful incentive to develop self-supervised learning models that could help to provide a better user experience in the future.

Content and Context on Social Media

Training natural processing (NLP) models to understand the context of any individual social media post is extremely challenging, especially when factoring in multimodal communication like images, text, and emojis.

For example, in March 2020 the volleyball emoji saw an unexpected spike in usage on Twitter. But at the outset of the pandemic, volleyball was not top of the public mind. So, what caused the spike?

Turns out that it was a reference to Tom Hanks’ film Cast Away, and his lonesome companion in the film, Wilson the volleyball. The actor had recently been diagnosed with COVID-19 and was quarantined with his wife in Australia.

This example shows how many different layers of context lie beneath even a trivial ‘low context’ social media post, and the breadth of data that is required to understand it.

“It’s a great example of how events change how people communicate, and how things that are happening in society can affect how our language models work,” says Neves.

One of the great challenges of training language models for social media is the multimodal nature of the content. And how the addition of a single word to an image might change the meaning from benign to hateful.

“[If] you have some hateful messages, those usually have something in the text that are referring to some part of the image,” Florez says, “Then you need to combine some part of the image and part of the text in order to actually detect what could be hateful.”

Recognizing Hateful Content Using NLP

To include image recognition in his models, Florez uses image encoders to break pictures down into a language recognizable to the transformers that are used in the model.

“We have an image encoder which provides categories, so now we have different parts of the image that are categorized as objects. We also have the text, so we have a vocabulary of tokens, and we also have a vocabulary of objects,” Florez says.

Elements of sample tweets, either images or text, are then masked. The model uses self-attention to compute the contextual importance of each element of the sample tweet and then attempts to predict the content of the masked element.

“The idea here is that we can use attentions and we can even visualize those attentions,” Florez explains. “So meaningful alignment between parts of images and parts of text combined together in order to detect what could be a hateful or not.”

Perhaps most interestingly, Florez and his team were able to produce results without the need for large amounts of labeled data.

“All of this without labels. We trained [the model] for one week with hundreds of millions of tweets and we get meaningful representation without one single label,” Florez says.

Multimodal Models Make Gains

State-of-the-art NLP models have made significant progress in their predictive power in the last few years, particularly when compared to common benchmarks.

“The models are getting very close to human performance, when not surpassing them. And they're getting very close to a hundred percent in some tasks, like the CoNLL benchmark,” Neves says.

However, Neves cautions that these kinds of very successful tests should not be mistaken for proof that NLP models are ready for ‘prime time’ on complex contextual datasets like those presented by social media.

“By looking at some, some [results] like this, you might believe that we are getting close to solving this problem, but that is definitely not true,” Neves quips.

He continues: “There's a big gap between the almost a hundred percent we can get on the CoNLL dataset with the results we can get on social platform tasks.”

However, interest in such models is high. When Nevez released his sentiment analysis model to the public it became the most downloaded model on Hugging Face, with more than 15 million downloads.

“This shows that not only is it a hard domain, but also that a lot of people care about it and are working on this domain,” Nevez says.

Florez also thinks that the future of big language models lies with multimodal NLP. And especially for social media, where the relationship between images and text is critical to establish meaning.

“Sometimes it takes an image to understand the words. There is nothing prevent preventing us from actually learning from both [types of] content if you have enough data,” Florez concludes. “The world is multimodal, content is multimodal, machine learning models should be too.”

Discover the full line-up of world-leading AI and NLP experts at our upcoming Conversational AI Summit on 14-15 September 2022.

Check out the full 2-day agenda here

It can be hard for language models to understand the meaning of ‘low context’ social media posts, but state-of-the-art NLP and image recognition models are starting to change that