Second Order Acceleration: Making Faster Neural Networks, Faster

The emergence of deep learning and AI technologies on mobile and embedded devices has created exciting new possibilities for applications such as detecting cancers, self-driving cars and smart homes. However, developing robust Deep Neural Network (DNN) models for everyday devices remains a significant challenge for both human engineers and computers. A single DNN can require billions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. Accordingly, techniques that enable efficient processing of DNNs to improve energy-efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the widespread deployment of DNNs in AI systems. Since raising a seed round with TandemLaunch Inc. and partnering with Brown University in early 2018, our team at Deeplite is tackling this problem to make artificial intelligence more accessible and affordable.

The adoption of powerful GPUs and cloud computing has created insatiable commercial demand for sophisticated deep learning solutions. Figure 1 shows how deeper models outperform classic techniques in many AI domains like computer vision and natural language processing. Historically when designing a deep network, developers focused on one metric: accuracy. As a result, many top-performing DNNs often require billions of operations for a single input. Compute-expensive operations like convolutions have become ubiquitous – a recent Microsoft study found 90% of deep learning mobile apps used some form of CNN.

Figure 1. Deep learning revolution

However, with the emergence of deep learning on embedded and mobile devices, DNN application designers must now deal with stringent power, memory and cost requirements which often leads to inefficient solutions and eventually preventing people from moving to these devices. Therefore, prototyping and production deployment for CNN-powered products, particularly for real-time inferencing and resource-limited devices like smartphones and self-driving cars, remains largely inefficient.

DNNs are heavily dependent on the design of hyper-parameters like number of hidden layers, nodes per layer and activation functions, which have traditionally been optimized manually. Moreover, hardware’s constraints such as memory and power must be considered to optimize the model effectively. Given spaces can easily exceed thousands of solutions, making it intractable to find a near-optimal solution manually (Figure 2). Recently researchers are developing algorithms called neural architecture search (NAS) to automatically design neural networks on a given dataset which are usually outperforming handcrafted networks. However, NAS algorithms are too computationally expensive and need thousands of GPUs to work, which is out of reach for many people and businesses looking to develop deep learning solutions.

Figure 2. Design space exploration

Deeplite introduces the Neutrino™ optimizer engine (Figure 3) which delivers a novel, automated, multi-objective design space exploration with respect to defined constraints. A reinforcement learning-based agent explores the design space for a smaller network with a similar performance of the given network trained on the same task.

There are typically many well-designed architectures, by humans or automatic architecture design methods, that have achieved good performance on their target task. Under strict computation resource limits, instead of totally neglecting these existing networks and exploring the architecture space from scratch (which does not guarantee to result in better performance architectures and needs huge computation power), a more economical and efficient alternative could be exploring the architecture space based on these successful networks and reuse their knowledge. We take advantage of this fact to speed up the process and use only a few GPUs in the process.

Neutrino™ can efficiently navigate the design space to yield an architecture which satisfies all the constraints and ideal for the target hardware. This step aims to reduce DNNs memory footprint and computation complexity which are crucial for low-end devices with limited available memory and processing power.

Figure 3. Neutrino™ engine

Neutrino™ aims to help humans create models that save on cloud costs and reduce time-to-revenue for edge AI products. Creating complex CNN models that satisfy our users’ constraints is streamlined through automated, AI-driven design space exploration. Watch how it can be applied here.

Deeplite will present at the Applied AI Summit on automated model design and its impact on advanced AI products. Interested in optimizing your DNN models? Neutrino™ Beta software is available for in-house testing. Get in touch with questions/requests at [email protected] or visit their website.

Second Order Acceleration: Making Faster Neural Networks, Faster

How teaching kids about AI will lead to the global greater good

Applied AI, Deep Reinforcement Learning & AI for Good: Join RE•WORK in San Francisco this June