This is a Women in AI Podcast transcript, for this interview we have Wendy Gonzalez, CEO at Sama, speaking with us about high-quality data training and what she's getting up to in her current role. We hope you enjoy the episode.
Topics explored include:
- How Do You See AI/Data Science as a Tool Which Can Facilitate Growth in Low-Income Communities?
- The Common Challenges With Being a Leader
- How Can You Secure High-Quality Training Data
- The Impact of Data Bias in Computer Vision
- The Challenges With the Availability of Data
- Does the Concept of Rapid AI Development Concern You From a Privacy Standpoint?
- How Can We Ensure That Our Quest for AI Improvement Also Ensures Societal Benefit?
- Future Plans at Sama
🎧 Listen to the podcast here.
So today I'm joined by Wendy Gonzalez on our Women in AI podcast episode, who is the Interim CEO of Sama, and I'm really excited to speak to her today. Hi, Wendy, how are you?
I'm doing well, thank you.
Great. Firstly, for those who don't know already, can you tell us a little about yourself and your current role at Sama?
Sure. I am the [then] Interim CEO at Sama. And by way of background, I've been at Sama for just about five years. Prior to that I was leading product for a SaaS company in the Internet of Things space and then a variety of different public companies and technology started my career out in management consulting for about the first decade, helping enterprises and SMEs transform their technology roadmap.
Fantastic. And how did you first come across AI? And what got you interested in AI technologies?
Well, the thing that is super fascinating about AI is that basically, what we're doing is we're having machines learn how to see, speak and hear. And it's an absolutely fascinating technology to where there are many, many different methods in which we can teach these machines, these kinds of human behaviours. And I think what's super fascinating is that a lot of it actually mirrors, in some cases, the way that humans learn themselves. So when I joined Sama there was a variety of different work that we were doing everything kind of from general human judgement services, like the transcription to data enrichment. But we were also doing training data for different computer vision applications and it was absolutely fascinating to see the level of expertise that our platform and our workers had and that there was an incredibly great opportunity for us to really leverage this strong and talented team of people to help some of our world's leading technologies really learn. So yeah it was fascinating. And since then, since basically 2015, we've been almost exclusively focused on training data, but we actually started in this space all the way back in 2012, right when the technology was actually emerging.
Fantastic, that's really interesting. And in addition to providing high-quality training data in technology, Sama is known for using the power of the tech industry to help lift people out of poverty. So how do you see AI and data science as a tool which can facilitate growth in low-income communities?
That is a great question. So Sama was founded, actually in 2008, with a mission to lift people out of poverty by giving work instead of giving aid. And what that really means is that the power of work in financial independence is a very powerful thing that creates both agencies and an opportunity for people to continue growing in their, not only careers but also the benefit of permanently breaking the poverty cycle. So that is how we got started, and in terms of how does that really apply to AI is that we have been working with many of the world's largest enterprises to train their leading machine learning technologies with the adoption of machine learning, right? So it's been a technology that has really started to make an impact in the last 10 years. And we're already seeing the massive amounts of applications that are really occurring in any and every industry, right? So autonomous vehicles, of which there's so much energy around, is just kind of one component. It's really applicable in pretty much every industry vertical that there is. And as the technology gets more and more adopted, the edge cases, the things that really require judgement and the kind of complex and the complexity is going to continue to grow, and that need for high-quality training data is going to be there for quite some time. So, there's a wonderful opportunity for us to use this model to really, not only help us power these amazing technologies, but to do so in an ethical manner. We can continue to hire and recruit from these underserved communities and make a really positive social change while advancing technology.
That’s fantastic. And what are the most common challenges you face as the leader of Sama?
That’s a great question. So sometimes I like to say that we might be the best-kept secret in AI. As a company that started with some more humble roots, and it's kind of in doing training data's in 2012. I think that one of the things that we have really been focused on is making sure that we're getting, you know, kind of our outstanding technology platform and expert humans in the loop out there for many companies to see. We have spent a lot of time working in this space and I think an in-part because of our, maybe our social mission and our origins, we tend to be known a bit more for the social mission than we do for the work that we do. So we're continuing to make sure that at the end of the day, this product of high-quality training data which is enabled through a really incredible model, it's really the training data, the value of the product that is what we would certainly like to be both seen for as well as known for. And then I also say is that this is an area that is really, really obviously hot. There's a lot of demand for researchers for PhDs in this space for data scientists, and we also are investing in these areas. So hiring and recruiting in this space can be challenging but one of the things that are really neat about our model, and the fact that our social mission is embedded into the way we do our business as it is. We have a really unique opportunity for people to work really at that intersection of AI and impact and that is what has been compelling. But nonetheless, the need for data scientists and researchers is high right now.
Yeah, absolutely. And your company's presentation, it focused on fighting bias in AI, and also how you can secure high-quality training data. So can you give me a few of the key takeaways from the talk?
Yeah, absolutely. So we focus on really avoiding AI data bias in data annotation in three ways. First, we advise our clients on training data strategy. So what's really interesting is that we have many clients that have top-notch researchers and large teams, but it isn't always necessarily core expertise to understand how to manage both the data pipeline, and what the right strategies are to ensure that the data that is being collected, and the scope of data can sufficiently cover all the use cases to get to high quality. So we work with our clients to really map out that strategy, and also through our platform identify areas that, we are achieving quality in these areas, this is the kind of data we need to collect and annotate. That also kind of leads into the second point, which is sort of data collection, or data creation capabilities, which can be really important, because if you don't have a particular angle or type of use case, again, that missing data could also create a data bias. And then last but not least, we have a diverse workforce. So we have a purposeful mission of hiring at least 50% of women into our organisation, and over 75% from underrepresented ethnic backgrounds. So we think we really are a unique position in the diverse workforce space.
Absolutely. It's really good to hear that that's kind of going on at Sama. And what is the impact of bias data in computer vision?
So at the end of the day bias can really make its way into any part of the data lifecycle and it really commonly presents itself in sort of three areas. One is data set bias, so do you have a comprehensive enough set of data to actually create the level of quality that you want to. There's the actual training bias, which could be if you have the way that you're training your models is overly emphasised in one way. That's kind of where the diverse workforce becomes quite important. And then algorithmic bias, in terms of what does it actually mean and what could it affect? Well, given that the application that you're building doesn't work, effectively. So imagine a situation where you're doing a global safety or global transportation and your algorithm is not able to identify people of all different sizes, races and colours, as an example. Or that's just one sort of example to where that basically then becomes a flawed model.
Great. And I think for reference, we're going to include the blog post, which recaps Audrey's AI bias presentation if anyone listening is interested in kind of hear more about this topic. And the availability of data is a trend we see at many of our summits. So do you see this as the main challenge you're currently facing in your work?
That's a great question. So we definitely see data collection and sort of the availability of having the diverse sets of data as a challenge in some organisations. Obviously, the largest ones that kind of have access to data are sitting at a pretty good advantage. But it is really critical for buyers to be able to have the access to that information and the sort of, there's only so many open-source data sets. So we have kind of a really a variety of methods by which we can work to help reduce that, or I should say, kind of remove that blocker. That is everything from being able to identify the data that might be missing in the data set that needs to be collected to bring that algorithm to the quality or it can actually be in support of the data capture, or data acquisition.
Great. And does the concept of rocket AI development concern you from a privacy standpoint?
Yeah, privacy is absolutely critical. One of the hallmarks of our strategy is really around trust, and that's creating both visibility to anything and everything we're doing, including the human workforce that actually does the annotation, in addition to the machine automation. But also ensuring that all the appropriate policies and protections are in place, because at the end of the day, wherever you collect that data security and trust is at the lowest common denominator, meaning if you don't know where your data came from, who touched that data, that is something that as an organisation, you can be liable from a privacy perspective. So we're extremely conscious of that and as it relates to our approach, we have full GDPR data processing and data collection policies in place. ISO certifications at our delivery centres, background checks, the technology platform has been built in a fully secure manner that also as clients wish we can store data in Europe, as an example or the U.S. So all those things are a critical part of how we actually deliver any of our products and it's something that, as we look at things like data acquisition, or working with partners, it's absolutely something we have to validate as well to say, do we know that these data sources came from the right places with either consent or with the appropriate protection measures?
Great, and how can we ensure that our quest for AI improvement also ensures societal benefit?
Yeah, that's a really good question. Well, the thing that I think we would certainly very much promote is the model that we've established here. We've spent quite a bit of time speaking with our larger enterprise customers and also SMBs, that have a desire to incorporate ethical standards or impact standards into the way that they do their procurement. So it's absolutely something that we promote and support and we have shared some frameworks with our clients on new content, things that they should be considering and the sort of evidence and proof that the partners they're working with are providing that ethical supply chain. We also participate in a number of industry forums, including ones like the Partnership for AI to be able to bring some of our learnings and best practices to a broader audience.
Great, fantastic. And, Wendy, what does an average week look like for you at Sama?
That is a really good question. Let me pull up the calendar. Actually, the average week is spent not only sort of working with our clients. So I do like to spend a lot of time understanding our client's challenges as it relates to how to bring their models to market, but also quite a bit of time with our team. So I was just in Montreal last week with our research and development team working on, not only the really interesting ways in which we can have the heavy lifting with automation but how do we really do things like enhancing and improving our analytics platform to provide pinpoint accuracy of missing data to remove data bias. As well as, with that, I think we have a really, really interesting model and that we have a vertically integrated workforce. So a lot of our time on technology is not only focused on automation and efficiency, but it's how do we enhance and improve the quality and engagement of our humans in the loop? So a lot of time with the R&D teams, a lot of time with our clients. And then, of course, I am out in our delivery centres in East Africa on a regular basis as well. So yeah, it's been a full but very, very fun ride.
Yeah, very busy. It sounds really exciting. And just leading on from that, what are your future plans? What does 2020 have in store for you? And what are you excited to work on?
Oh boy, so many things. I'll try to keep myself contained here. But one of the really exciting things in 2020 is we are doubling our R&D team again. So we're making substantial investments in research. And, again, as I touched on that element of how do we build our technology platform to maximise and kind of move our humans in the loop up the value chain, is something that I'm very passionate about. What's really interesting about this technology and really our models is that it's not a question of having humans do what they did five years ago, there's no, even two years ago, or one year ago. This technology is so disruptive that identifying is our... in this picture it’s like something that is long gone. We're continuing to work on more and more highly complex scenarios, and oftentimes now we are doing things like model validation which is really neat. So it's been neat to see us kind of grow in our depth and really have that information technology strategy. In addition to that, we are also kind of on the human side of things, making more investments for growth in East Africa, which we're really, really thrilled about. We have, from an impact standpoint, some really exciting data that will be coming out soon as we've done a randomised control trial, which is the holy grail or gold standard of how you prove the impact of this impact model. So that is something that we've been working on as well. In addition to really focusing on growth and market, so what we continue to see and grow in areas like autonomous transportation, we're spending a lot of time in consumer media and entertainment, e-commerce, hardware and software. We are also really digging deep into some new industry verticals, which I think is very exciting because the number of applications you can apply machine learning models to is absolutely amazing. So, that was a mouthful but that is just some of the things that we are really excited to work on here in 2020.
Yeah, that sounds amazing. And I wish you all the luck for the rest of the year. And where can we keep up with Sama work going forward?
Thanks for asking. The best way to keep track of us is at http://sama.com/ And we also have a number of social media outlets as well. And you can find us @sama_ai_.
Fantastic. So that'll all be included in the description if you're listening and wants to know more. And Wendy, it's been an absolute pleasure to speak to you today on our podcast and good luck with everything.
Thank you. Appreciate the time.
As always, we hope that you enjoyed the episode, we'd love to hear more about what topics you'd most like to hear about from the Leading Women in AI and if there are any burning questions you'd like us to ask, let us know and drop me an email at [email protected]. In the meantime, stay safe and I wish you all a wonderful week.
Following on from Sama's social media outlets, please see the following;
Are you interested in reading more AI content from RE•WORK and our AI experts? Read these articles below:
- Girl Decoded and Emotion AI - Rana el Kaliouby
- 12 Women in Machine Learning to Watch
- How Chatbots Fail & Conversational AI Supersedes 2020
- Starting a Career in AI and Giving Back to the Community - Diana Murgulet
- An Introduction to Federated Learning
- AI Experts Discuss The Possibility of Another AI Winter
- 20+ Pieces Of Advice From AI Experts To Those Starting Out In The Field
- How Has COVID-19 Accelerated Digital Transformation? - Claire Calmejane
- How can Startups & Enterprise Leverage Each Other? - Shaloo Garg
- How to Overcome Main Challenges in Implementing AI in the Insurance Industry and Improve Claims Management Process
- Why AI Ethics Matter - Kay Firth-Butterfield, Head of AI and ML at WEF
- AI in Loyalty & Subscription Products - Emily Bailey