Andrej Karpathy is a 5th year PhD student at Stanford University, studying deep learning and its applications in computer vision and natural language processing (NLP). In particular, his recent work has focused on image captioning, recurrent neural network language models and reinforcement learning. Before joining Stanford he completed an undergraduate degree in Computer Science and Physics at the University of Toronto and a Computer Science Master's degree at the University of British Columbia, as well as two internships at Google Research in 2011 and 2013, and one at Google DeepMind in 2015.  His work and personal expertise has received a lot of media attention, including features in Wired, Gizmodo, LiveScience, The Economist, Popular Mechanics and Bloomberg, to name a few. In particular, a fun project he created that uses convolutional neural networks to distinguish what makes a "good" selfie (image above) was picked up worldwide, causing discussion on AI, computer science and neural networks to appear everywhere from Elle to NBC News. At RE•WORK Deep Learning Summit, Andrej will be discussing 'Visualizing and Understanding Recurrent Networks'. Recurrent neural networks (RNNs), and specifically a variant with long short-term memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood. In this presentation, Andrej aims to bridge this gap and provide detailed error analysis that suggests areas for further study. We spoke with Andrej to hear more about why he became interested in deep learning and what developments we can expect to see in the next few years.  What got you interested in deep learning?My interest was sparked in 2008 in Geoff Hinton's class at the University of Toronto. I remember that he frequently used terms such as "in the mind of the network", and I found the idea that we're simulating thoughts in a computer intriguing. Coming from a Physics background I also find Deep Learning models quite aesthetically pleasing - Their architectures are simple, homogeneous, and achieve a lot with only a few concepts.  What are the practical applications of your work and what sectors are most likely to be affected?My recent work is focused on the design of neural network architectures that process images, natural language, and both modalities at once via multimodal embeddings. My recent work on Image Captioning is an example of playing with neural architecture lego blocks, where I composed a Convolutional Neural Network with a Recurrent Neural Network Language Model to generate captions for images. This work impacts any applications that combine natural language and images, such as searching image databases with extended text queries.  What are the key factors that have enabled recent advancements in deep learning? First, Deep Learning models make weak modeling assumptions (this allows them to be general), but the price is that we must instead feed them a large amount of data during training, and data at the required scale has only recently become available. Second, it takes a lot of computation to process the data and until recently our computers and software were simply not fast enough. I would say that the last key factor is a community. Recent strong results across multiple application domains have started a virtuous circle: As more people shift their attention to Deep Learning research and applications we are collectively accelerating the field's progress, which in turn increases the amount of strong results.  What are the main types of problems now being addressed in the deep learning space?I'll mention a few. As one example I'm seeing quite a bit of progress in architectural innovations, including larger and more complex neural networks that incorporate sequential processing in form of recurrent networks, attention mechanisms and novel memory modules. There's also interest in Deep Reinforcement Learning as a result of recent successes in ATARI game playing. More generally, algorithms from this domain can be used to build neural networks with non-differentiable components. I'm also seeing interest in using neural networks for approximate inference in Probabilistic Graphical Models in variational frameworks. Lastly, from an engineering point of view we still haven't figured out the best software tools and abstractions for building/defining these models quickly, visualizing/debugging them during training and running them at scale.  What developments can we expect to see in deep learning in the next 5 years?Instead of describing several interesting on-the-horizon developments on high level I'll focus on one in more detail. One trend I'm seeing is that the architectures are quickly becoming bigger and more complex. We're building towards large neural systems where we swap neural components in/out, pretrain parts of the networks on various datasets, add new modules, finetune everything jointly, and so on. For example, Convolutional Networks were once among the largest/deepest neural network architectures, but today they are abstracted away as a small box in the diagrams of most newer architectures. In turn, many of these architectures tend to become just another small box in the next year's innovations. We're learning what the lego blocks are, and how to wire and nest them effectively to build large castles.  What advancements excite you most in the field?I am most excited by the advances in Deep Reinforcement Learning because its problem setup (an agent interacting with an environment in a loop) is in my mind closest to AI. In most applications we have a fixed dataset and some notion of a loss. But in Reinforcement Learning the agent has the opportunity to interact with the environment, to learn at its own pace, to inspect, hypothesize, experiment, plan, and think. This framework offers a vast array of interesting challenges from the research perspective, and ultimately I believe also the largest payoffs.
Andrej Karpathy will be speaking at the RE•WORK Deep Learning Summit in San Francisco, on 28-29 January 2016. Other speakers include Andrew Ng, Baidu; Clement Farabet, Twitter; Naveen Rao, Nervana Systems; Pieter Abbeel, UC Berkeley; and Oriol Vinyals, Google.


Deep Learning Summit is taking place alongside the Virtual Assistant Summit. For more information and to register, please visit the event website here.