Curiosity Driven Learning is one of the most exciting and promising strategies in deep reinforcement learning: we create agents that are able to produce rewards and learn from them. At the Deep Learning Summit in San Francisco this January 24 - 25, Thomas Simonini will be running a workshop where attendees will learn how curiosity-driven learning agents work and the main elements needed to implement them. In this workshop, attendees will learn what curiosity is, how it works, and understand the process of how an agent generates this intrinsic reward using a trained agent in a video game environment.
Thomas Simonini is a Deep Learning Engineer specialized in Deep Reinforcement Learning. After a Bachelor Degree of French Law and Political Sciences in 2016, he decided to change career by learning AI. He graduated from Deep Learning Foundations and Artificial Intelligence Nanodegree by Udacity. He founded Deep Reinforcement Learning Course. A successful series of articles and videos about Deep Reinforcement Learning from beginner to expert published on FreeCodeCamp and Towards Data Science. Before that, he founded CatDCGAN, an open source AI project that generates realistic pictures of cats. We caught up with Thomas in advance of the summit to hear more about his work:
How did you begin your work in deep learning?
I began my work in Deep Learning in September 2016, after a French Law and Political Sciences Bachelor Degree. At that time I was already passionate about AI and its applications and I decided to not pursue a Master Degree in Law and instead self-study during 2 years mathematics, deep learning and deep reinforcement learning to become a Deep Learning Engineer specialized in Deep Reinforcement Learning and Computer Vision. In order to do that I’ve applied and I'm graduated from Deep Learning Foundations and Artificial Intelligence Nanodegree by Udacity.
Tell us a bit more about your current work - how are you using deep learning, and for what results?
I’m currently attending interviews for a job and I use deep learning to demonstrate my skills.
But I use deep learning a lot in my other activity: deep reinforcement learning course. This course, founded in March 2018, is a series of articles and videos where we master the skills and architectures we need, to become a deep reinforcement learning expert. We implemented a lot of agents that learns to play video games from OpenAI Gym FrozenLake, Doom to Sonic the Hedgehog.
Some agents got good results such as A2C with Sonic the Hedgehog (Sonic was able to overcome a lot of obstacles). However I’m currently improving all the implementation with new GPU trainings and new implementations with PyTorch.
What is curiosity driven learning, and why is it so promising?
To understand curiosity driven learning we must first remember that reinforcement Learning is based on the reward hypothesis, which is the idea that each goal (such as win the game, finding the shortest route…) can be described as the maximization of the rewards. However, the current problem of extrinsic rewards (aka rewards given by the environment) is that this function is hard coded by a human, which is not scalable.
The idea of Curiosity-Driven learning, is to build a reward function that is intrinsic to the agent (generated by the agent itself). It means that the agent will be a self-learner since he will be the student but also the feedback master.
How we do that? We calculate the error of our agent to predict the consequences of its own actions: does our agent is able to predict correctly what would be the next state if he takes this action at that state?
Why? Because the idea of curiosity is to encourage our agent to perform actions that reduce the uncertainty in the agent’s ability to predict the consequence of its own action (uncertainty will be higher in areas where the agent has spent less time, or in areas with complex dynamics).
That’s promising for two reasons. The first is that extrinsic rewards are not scalable. That’s not a problem in video game environments because the extrinsic reward is most of the time the score. But what about a real environment (let say a car on a road)? Normally we’re supposed to hard code a reward function which is not scalable.
The second, is that curiosity helps us to handle the problem of sparse rewards aka rewards that equal to 0. That’s the case in most video games, you don’t have a reward at each timestep. The problem is that our agent needs to have feedback to know if its action was good or not. Using an intrinsic reward like curiosity helps to get rid of that problem (since a curiosity reward is generated at each timestep).
What are some of the key elements needed to implement curiosity driven learning?
There are different ways to implement an agent with prediction-based rewards. But most of them refers to create a ICM: Intrinsic Curiosity Module. Aka a module that generates prediction-based rewards.
To be simple, this module, composed of two neural networks, will generates the error between our predicted next state and the real next state (hence the curiosity). And we’ll use this curiosity as our reward.
In which industry do you see the biggest transformation happening in the coming years, and where will AI have an impact?
AI is currently having an impact in all industries. But I think that the biggest transformation happening in the coming years will be in health. With Cancer Detection (DeepMind Health) to detect cancer before being perceptible by human eye. And also eye diseases detector (also by DeepMind) which helps ophthalmologists to find what illness the patient suffers.
How are you using AI for a positive social impact, or how can your work be applied to other industries for social good?
I use AI for a positive social impact in education. I founded Deep Reinforcement Learning Course in March 2018, with a simple idea: create a complete course from beginner to expert with complete implementations. The course is a success with a total of 33,000+ claps and 4,000 reads per week.
I made this course for free and open sourced the implementation for two reasons. First, because I believe that innovation comes from sharing knowledge with open education. I believe that education should be opened to everyone and as a consequence must be free or at least accessible.
Second, because I thought that I needed to contribute for AI community because it's thanks to people that gave their courses and implementations for free (i.e M. David Silver with Reinforcement Learning course, OpenAI Baselines, Stanford CS231n, MIT Open Courseware...) that I have skills in Deep Learning, Reinforcement Learning and mathematics.
Where will we see AI benefit society the most?
Again I would say in health because innovations in this domain are fundamental in a world where the population will age more and more. Moreover, if we want society to accept AI, we need to show to people how this technology can really improve human life, and health is the best example.
However, I think we must stop thinking that AI will solve everything without the help of politics. Because how AI will be able to improve human health if people can’t afford to go to the hospital because their country does not have a free healthcare system?
What does a typical day look like for you?
My typical day starts at 6:30 am with my morning routine (sport, breakfast…). Then from 7:30 to 9:30 is the education time, I dedicate this time to read new papers in deep learning and learn new concepts in mathematics and in blockchain. After that, I go to a coworking café until 6:30pm where I work (update my course, trying to implement new agents, working on my different side projects) and/or go to some job interviews.
Finally, I go to an afterwork, meetups or have a drink with some friends.
What’s next for you in your work?
My next step is to find a job in Bay Area. That’s why I’ll be in Silicon Valley for one month from January 22th to February 19th to get interviews.
Want to learn more from Thomas? Join us at the Deep Learning Summit in San Francisco. Additional confirmed speakers include experts from Facebook, Salesforce, AI4ALL, Google, DeepMind and many more.