Deep Learning with Structure

Charlie Tang is a PhD student in the Machine Learning group at the University of Toronto, working with Geoffrey Hinton and Ruslan Salakhutdinov, whose research interests include machine learning, computer vision and cognitive science. More specifically, he has developed various higher-order extensions to generative models in deep learning for vision. At the Deep Learning Summit in Boston next month, Charlie will present 'Deep Learning with Structure'. Supervised neural networks trained on massive datasets have recently achieved impressive performance in computer vision, speech recognition, and many other tasks. While extremely flexible, neural nets are often criticized because their internal representations are distributed codes and lack interpretability; during his presentation at the summit, Charlie will reveal how we can address some of these concerns. We had a quick Q&A with Charlie ahead of the Deep Learning Summit, to hear more of his thoughts on developments and challenges in deep learning.What are the key factors that have enabled recent advancements in deep learning? The three key factors are:- The steadfast belief and knowledge that supervised neural networks trained with enough labelled data can achieve great test set generalization.
- The availability of high performance hardware and software, in particular, Nvidia's CUDA architecture and SDK. This allowed more experimentation and the learning from large-scale data.
- The development of superior models: switching to rectified linear hidden units from the sigmoid or hyperbolic tangent units and the invention of regularization techniques, specifically "Dropout".What are the main types of problems now being addressed in the deep learning space?Almost all problems in statistical machine learning are currently being investigated using deep learning techniques. They include visual and speech recognition, reinforcement learning, natural language processing, medical and health applications, financial engineering and many others.What are the practical applications of your work and what sectors are most likely to be affected?The deep learning revolution allows models trained on big data to drastically improve accuracy. This means that many artificial intelligence recognition tasks can be now automated, which previously necessitated a human in-the-loop.What developments can we expect to see in deep learning in the next 5 years?Deep learning algorithms will be gradually adopted for more tasks and will "solve" more problems. For example, 5 years ago, algorithmic face recognition accuracy was still somewhat worse than human performance. However, currently, super-human performances are reported on the main face recognition dataset (LFW) and the standard image classification dataset (Imagenet). In the next 5 years, harder and harder problems such as video recognition, medical imaging or text processing will be successfully tackled by deep learning algorithms. We can also expect deep learning algorithms to be ported to commercial products, much like how the face detector was incorporated into consumer cameras in the past 10 years.What advancements excite you most in the field?I feel like the most exciting advance is the availability of low-energy mobile hardware that supports deep learning algorithms. This will inevitably lead to many real-time systems and mobile products which will be a part of our daily lives.The Deep Learning Summit is taking place in Boston on 26-27 May. For more information and to register, please visit the event website here.Join the conversation with the event hashtag #reworkDL