Audio content is on the rise. What role does artificial intelligence have to play?

We had the pleasure of chatting with Mari Joller, Founder and CEO of Snackable AI, a content discovery engine for the audio-first world for the RE•WORK Women in AI podcast. Below is a transcript of the discussion (generated via Snackable AI).

Topics explored include:

  • Creating a startup
  • Startup ecosystems
  • Video content & metadata
  • Datasets & challenges faced
  • Changes in Consuming Content
  • Content accessibility
  • The future of spoken word

🎧 Listen to the podcast in full here.

Chapter 1: Creating a Startup

00:04 - 00:21
Question from Nikita Johnson (RE•WORK): Hi, Mari, thank you for joining the Women in AI podcast today. It's fantastic to have you here with us. So firstly, it would be great if you can tell us a bit about yourself and also introduce us to Snackable.

Mari Joller: Thanks Nikita it's great to be here. Thanks for having me on your podcast. So I'm Mari. I'm founder and CEO of Snackable AI. We are a technology startup. We're three years old, venture funded, and we're headquartered in New York City. What's Snackable does is that we transform the way that people discover and listen to webinars, podcasts or other long form spoken audio video content.

00:40 - 01:15
Personally, I've been in the audio meets content meets AI world for the last six years. So Snackable is actually my second startup in this space, and going further back, I'm originally from Estonia. I came to the U.S. for college. I went to Middlebury College in Vermont studying economics and Chinese, and then I went into consulting. So I worked for a boutique strategy consulting company called Kaiser Associates in Washington, D.C. I wanted to go from working on companies and problems to actually working in companies and creating new products, creating new companies.

01:15 - 01:41
So I went and got my MBA at Harvard and then I went onto what they call the operational side. So I was an intrepreneur before I was an entrepreneur, meaning that I was creating, building, launching, scaling new products and services at places like Nokia and Virgin Mobile, having opportunities like reimagining how people in emerging markets consume and interact with mobile maps.

Chapter 2 Estonia and the startup ecosystem

01:42 - 02:14
Q: And so you mentioned that throughout your career you've been involved in different organizations which develop, create or function through the use of voice technology. So when did you see the shift in NLP and conversational and speech-based technologies becoming much more mainstream? And is that something that you recognise as game changing prior to that wider industry acclaim?

02:14 - 02:44

So I think in this in the startup world, the wisdom goes that timing is everything. So for me, that was before Snackable, I founded a company called Scarlet in 2015 and we conceived this entirely new kind of a voice assistant that wouldn't force you to ask questions, but proactively would whisper in your ear with with the right answers and updates. And so Scarlet was really the first place where we saw the power of short term audio. And in a way it was the genesis for Snackable. So again, that was 2015. So we could see the growth of prevalence of AI. technologies, especially voice that was becoming pretty evident.

02:44 - 03:08
Alexa was launching that year. The team that had built Siri was launching a new company called Viv, and they sold it to Samsung a year later. So we saw the wave of auto building and we saw an opportunity to create a unique platform to deliver content to this voice-first ecosystem. My own draw, actually, to disruptive technologies or introduced much earlier than that.

03:08 - 03:41
So I'm from Estonia and I was fortunate as a kid to grow up at a time when there was a massive transformation taking place. We got our independence from Soviet Union and with that actually built up the state from scratch, build up the whole country, in fact. So I saw first-hand how technology can be leveraged because Estonia simply didn't have the resources to build up expensive analogue infrastructure. So everything was built digitally. And today, Estonia as a leader in digital government, and it provides a platform for private and public services.

03:41 - 04:13
And we also have one of the most thriving startup ecosystems, I think six unicorns in a country of one point three million. So, again, you know, for me, seeing from an early age, what technology can do and the transformative power of it really was something that has drawn me to it since. And I've seen it as an opportunity to have a part in building products that can effect this positive change and to better people's everyday lives. So Snackable uses artificial intelligence to add structure and metadata into spoken word content.

3 Video content and metadata

04:14 - 04:22
Q: Can you share a bit more about the process of that, about how it works and also the benefit that it can have for businesses?

04:22 - 04:57
Sure. I'd like to actually start by giving you an analogy, and that is of a bookstore, because surprisingly, the way that audio video content today is produced is kind of akin to having a book that is entirely wrapped in plastic. So imagine you went to a bookstore and every single book that you picked up was sealed in shrink-wrap. So all you could do is turn the book around in your hand just to your front cover with the title and the author's name to turn it around. Maybe you see a few things that people have written about the books. A couple of nice reviews. That's it. You can't open it. You can't see the table of contents, you can browse it. You can't find an interesting part and read out to your friend. It's essentially just all hidden from you.

05:08 - 05:29
And ironically, that's how audio video is produced today. So what Snackable does it unwraps the content. So we use AI to essentially ingest any audio video recording and we break it down, number one to logical chapters. So imagine, even in this recording, we'll have a number of different questions. We'll discuss different topics.

05:29 - 05:39
So imagine actually being able to see a table of contents that outlines the summaries of these topics. That's what we do. Then the AI goes in and it finds a couple of interesting highlights. So it's kind of shareable snippets or snacks, if you will, that are good for sharing on the social media or just generally having access previews or as little teasers for your content. And of course, we extract all the metadata, the tags to be able to easily search for specific mentions, also to be able to tie the content together across a library and transcripts for accessibility. And finally, our search engine allows you to find very specific details or mentions from deep within the content repository.

06:13 - 06:34
Why is this important for businesses? The impact is quite significant. What we've seen in our experience is that unwrapping your audio video content with Snackable, you can drive more than 30 percent engagement lift from the target audience.   That's a great way to put it, imagining it in a book sense, and I've used the product and I think it's a great way to try and envision it.

Chapter 4: Datasets and challenges faced

06:35 - 07:01
Q: And so what are some of the biggest challenges that you faced Snackable when you think building the product?

I think some of the challenges have been not so different for any other AI startup and that I would say the number one is actually accessing and generating the right data sets. That's a challenge for any startup like us. So you're starting with no data and trying to figure out where do you where do you get the data?

07:01 - 07:30
How do you train your algorithms, how to gain the insights? So what we've done at Snackable is that we've used a lot of the available data sets out on the Internet and the ones that we've been able to purchase we've also adapted those data sets for our to our use cases. And then finally, we've used unsupervised models to deal with a cold start problem. And, of course, fortunately that we've been able to partner with some large customers and improve our models with their data while we're delivering value back to them.

07:31 - 07:49
I would say the second problem worth highlighting is one that's more specific to Snackable, which is adapting models. So with the pandemic what we've seen is a kind of an explosion in the volume and variety of data that we can work with. So since the pandemic, you know, all of our conversations have gone digital.

07:49 - 08:11
There is increasingly more data. There's increasingly more recordings that the companies and individuals produce. So it's, of course, great for our business, and also for the learning of our algorithms. But at the same time, we've also seen the challenges that it presents in terms of a) scaling the infrastructure and also the different data sets that are required and adaptations to our models.

08:12 - 08:36
And sometimes we have to think of entirely fresh approaches. You know, example of that is a conversational podcast is very different from a very structured webinar. So how we think about how to break down the content, what's important from within it, and kind of related topics can be very different and require us to adapt our models or, you know, take an entirely kind of fresh approach to that, that problem.

Chapter 5: Consuming webinars using AI

08:37 - 08:51
Q: And how do you see voice space technology interrupting the AI space in the coming years, do you think it will be seen as a must have for organizations both in and outside of the AI arena?

08:51 - 09:21
Absolutely. So our customers are actually companies across a variety of industries, and they're not in AI. They are financial services providers, they're media companies, agencies, large technology companies. So it's really any company that communicates with spoken content. In fact, I think you don't have to be at all in a company to benefit from AI. So these companies need Snackable for growth. So our customers need Snackable both for internal knowledge sharing and also for external communications. And it's in fact the job of AI to automate the tedious work of of people of content creators and surface insights that weren't possible before to support the business goals. So our technology enables everybody creating content to do so easily and to engage their audiences. And we allow this audience is in a way to get smarter, faster.

09:43 - 10:11
So if I could maybe give you an example, one of our customers is a large financial services company. They produce lots of webinars on a daily, weekly, monthly basis that can range from product performance reviews, industry trend discussions and any other thought leadership that they produce. And they do that with a goal of either engaging their current customers, attracting new ones, and ultimately driving assets under management, which for them is the revenues.

10:12 - 10:27
So all of the webinars that they produce go through Snackable and we quickly extract them, communicate inside from those those webinars. So meaning that they're webinars equipped with readable chapters so the listener can tune in just to the part that matters to them. And I think that's really critical. I was just speaking to the Head of Marketing of a large hedge fund the other day, and she said, when I send out an hour's worth of content, if I can get five minutes of really deep engagement from the person I'm trying to reach, that's a win for me. And I think that's that's that's true for it for everybody. Because people these days are very strapped for time. They're trying to consume increasing amounts of information in the most efficient way. So serving that need, I think, has to become standard for companies.

Chapter 6: Changes in consuming content

11:02 - 11:36
Q: Definitely. And that was just something that's going to touch on was how much of the advancements in technology and application has been driven by consumers that want to digest their content in a different way than perhaps they did in only a few years ago?

Yeah, that's a really interesting question. The way that we think about it is what we call content atomisation, meaning atomizing, meaning the content is simply getting shorter over time. And that's been happening across every single content format. So in written text, we went from books to blogs to tweets. And video, we went from movies to sitcoms to TikTok.

11:40 - 12:14
So people's habits have been changing to consume content during these micro moments. And last year with a pandemic, it created an even further shift in that behavior. And at the same time, what was happening while we were having, you know, more and more time in front of screens and less and less time to do other things, is that the content volume was growing at the same time. So to me, it's not surprising at all that people are trying to find ways to consume content in a way that's efficient and timesaving and doing so more than ever, even compared to five years ago.

12:15 - 12:37
So, again, you know, this analogy, you know, you missed a webinar and all you've been given is the recording. And that happens a lot. Over 50 percent of people actually don't attend the webinar when it first airs. They consume the content 48 hours later. So when you receive the recording and if you are using Snackable, you can now look at side of it. You can see the highlights, the chapters. You browse it almost like you would just browse a table of contents. What that means is that you get the gist and important learnings from the webinar and one tenth of the time that would otherwise take you and that has significant time savings.

Chapter 7:  Content accessibility & tagging

12:52 - 13:06
One of the things I wanted to add to this and what I found very interesting is that looking back to 2020, there's two areas that are just so explosive growth, e-commerce and spoken audio and video. So the analogy there is really interesting to me. So when you compare that, that what happens, let's say I'm going as a consumer and I'm trying to buy a product, let's say I'm trying to buy a pair of headphones. I do my search on the Internet. I let's say I go to Amazon, I look around and ultimately I am landing on a product page and that's the hardest working page on the Internet. It gives me all the information I need to know about those headphones. I can see what they look like from different angles.

13:36 - 14:00
I can see where they're made, I can see who made them, what they cost, what people have said about it, essentially, given all the context, to inform my decision to click on the buy button. While with audio video files, all I get is a black box. I can't see who was in the recording. Maybe I can see a little bit, but I can't see kind of like what the flow of the conversation was. I can't see what was important in it. I can't see the tags, I can't see that people mentioned.

14:04 - 14:38
So it's really hard for me to make a decision whether I should listen or if I start listening, where should I start from. And I think this will change very quickly and become a standard of having much more of this e commerce analogies with product pages or episode pages for audio video. So the target audiences can very quickly ascertain, OK, they started talking about, you know, emerging market bonds in minute forty five when I click right there, I know you've saved me forty five minutes of listening to content that perhaps I wasn't interested in. And I think we owe this to people because everybody values their time.

14:39 - 14:59
Q: Definitely, it's just going to become more and more important. And can you share a bit more about any current projects you're working on that's Snackable?

Sure. So what we're doing is we're always working on making our AI be a bit smarter. I mean, our mission is to make the world's spoken content both accessible and valuable. So if we think about what that means, it's about extracting the relevant insights. So, for example, what is the best way to quickly get the learnings from a webinar? What should we highlight? Where should you or I go first? What should be the best way to for you to share that content? So what we're working on is how to improve those insights and tools to make the product even more useful for both the content creator and for the consumer. And of course, we're growing the team to keep up with the market demand and then scaling our business at the same time.

Chapter 8: The Future of Spoken Word & Snackable

15:33 - 15:59
Q: And finally, where do you see Snackable in the next five years? It's a tricky question with the current times that we've gone through. But is there anywhere that you see yourselves going and that you're working towards?

Absolutely, yes. So reading the crystal ball, what we really are building is the data layer for the spoken Internet. So and there's two elements to that.

15:59 - 16:25
One is to have this data layer accessible to individuals and teams so that everybody can easily communicate over audio video. The second is to package this intelligence into a suite of APIs so that organizations can also quickly analyze every new audio and video recording that they make so they can get the insights into their knowledge base of this audio and video communication and to create new experiences.

16:26 - 16:49
What's interesting is that the fastest growing companies today have gotten quite good at knowledge management when it comes to written content, however, what's happening now, and especially since the pandemic which triggered this massive transformation, is that the spoken Internet or audio video is catching up in size to the written Internet.

16:49 - 17:12
I think the very interesting statistic is that it's really a data point is that if you took an hour's worth of recording and you transcribed it to text, it's 20 pages of double-spaced text. That's really long. It's a lot of content to absorb. So the size of this spoken Internet is growing very rapidly. And with that is a need to quickly create transparency and drive engagement with spoken word content to not have it be a black box. So whether we're talking about a training video, podcast, product introduction, thought leadership webinar, these are all the types of spoken content that need more transparency and access to.

17:35 - 17:51
So it's also clear to me that we will continue to live and work in this virtual or at most a hybrid world. So I see Snackable in five years as a platform that both companies and consumers use to get useful insights from either one one hundred or one million recordings.

You can learn more about Snackable here. If you would like to get involved in any of our Women in AI initiatives, then please do get in touch at [email protected].

Join the Women in AI Virtual Summit on 9 November

Join us on 9 November to meet 100+ Women working in AI at our upcoming 2-hour virtual event.

Are you interested in reading more AI content from RE•WORK and our AI experts? Read these articles below: