The Great Conundrum of Hyperparameter Optimization

Regularization is tuning or selecting the preferred level of model complexity so deep learning models are more successful in predictions.

Techniques for regularization are applied to machine learning models to make the decision boundary/fitted model smoother. Those techniques help to prevent overfitting. Examples include: L1, L2, Dropout and Weight Decay in neural networks.

With every deep learning algorithm comes a set of hyperparameters, and optimizing them is crucial in achieving faster convergence and lower error rates. The majority of people working in deep learning use common heuristic methods to tune hyperparameters, such as learning rates, decay rates and L2 regularization. Recently researchers have tried to cast hyperparameter optimization as a deep learning problem but they are often limited by the lack of scalability. At the Deep Learning Summit in Singapore, speaker Ritchie Ng, Deep Learning Research Assistant at the National University of Singapore (NUS), will show how scalable hyperparameter optimization is now possible to accelerate convergence, and can be trained on one problem while enjoying the benefits of transfer learning that is scalable. This has impact on the industrial level where deep learning algorithms can be accelerated to convergence without manual hand-tuning even for large models.I caught up with Ritchie ahead of the summit on 27-28 April, to learn more about advancing hyperparameter optimization, natural language processing (NLP), and challenges in the deep learning space.What started your work in deep learning? I started off with standard machine learning algorithms like Support Vector Machines (SVMs) for computer vision tasks. Subsequently, I observed remarkable performance using Deep Neural Networks (DNNs). Even further, I discovered the use of Deep Convolutional Neural Networks (CNNs) that gave superhuman performance on some computer vision tasks. And this marked the start of my deep learning journey. Indeed, I am a fan of the Turing machine. And coincidentally there were advances in a Turing complete deep learning algorithm called Recurrent Neural Networks (RNNS). This got me excited and I forayed into RNNs where it is a main focus of my research. As I ventured even deeper, I was plagued with the curse of a growing number of hyperparameters to tune and I was mainly tuning it by hand with established heuristics. And this leads me to my current main research focus which is on “learning to learn” where I am able to cast hyperparameter optimization as a learning problem. What are the key factors that have enabled recent advancements in deep learning? There are two factors and they are underpinned by the concept of openness. The main factor, in my opinion, is that research in this field is relatively open where many people openly publish their work from research labs in universities to private labs in corporations. This gives other researchers the ability to learn and build on one another’s research to continually push the boundaries of deep learning at a rapid pace. The other factor is the openness of code published today where researchers can build on existing code and share it openly with others. Moreover, the openness of the publication of code enables some level of homogeneity in terms of programming languages used which helps facilitate collaboration due to common languages. What are the main types of problems now being addressed in the deep learning space? One example is how unsupervised learning still requires a lot of work. Also learning to learn is increasingly gaining prominence and reinforcement learning with possibly game theory for complex multi-agent interactions. On the application side, for example healthcare where I have an interest in, there is growing attention paid to using deep learning for medical imaging diagnostics such as detecting breast cancers, lung nodules, pneumothorax, intracranial bleeding and more. What developments can we expect to see in NLP in the next 5 years? There would be better understanding of natural language where deep learning algorithms can better understand connections amongst sentences and paragraphs. Another possible interesting development is how NLP would advance further to have a more realistic dialogue with humans where we move one step closer to an AI agent being able to pass the Turing test. What is the impact of your work on advancing hyperparameter optimization? For many years, the mass majority of people in the deep learning community are currently using common heuristics to tune hyperparameters such as learning rates, decay rates and L2 regularization. In recent works, researchers have tried to cast hyperparameter optimization as a deep learning problem but they are limited by their lack of scalability. My work is a first step in showing how it is now possible for scalable hyperparameter optimization that accelerates convergence that can be trained on one problem while enjoying the benefits of transfer learning. This has impact on the industrial level where deep learning algorithms can be accelerated to convergence without manual hand-tuning even for large models.There's just 1 month to go until the Deep Learning Summit, taking place alongside the Deep Learning in Finance Summit in Singapore on 27-28 April. Explore how deep learning will impact communications, manufacturing, healthcare, transportation and more. View further information here.

Confirmed speakers include Jeffrey de Fauw, Research Engineer at DeepMind; Vikramank Singh, Software Engineer at Facebook; Nicolas Papernot, Google PhD Fellow at Penn State University; Brian Cheung, Researcher at Google Brain; Somnath Mukherjee, Senior Computer Vision Engineer at Continental; and Ilija Ilievski PhD Student at NUS.

Tickets are limited for this event. Register your place now.

The Great Conundrum of Hyperparameter Optimization

Using AI to Estimate Customer Life Time Value in E-Commerce

New Approaches to Unsupervised Domain Adaptation