If I were to describe a fictional person, place or object right now, the chances are that you would be able to evoke an image of my description in your mind's eye. I could also suggest variable features of the item which would cause you to imagine different versions of the object. This seemingly simple task however, poses several challenges for machines.
At the Deep Learning Summit in London last week, we heard from Andreas Damianou from Amazon who shared his recent work on improving the functionality of machines in scenarios of probability and uncertainty through applications of deep learning.
Standard neural networks learn functions by inputs and outputs, and ‘once they’ve learned it becomes a deterministic thing. You give inputs and you get all the same outputs.' Now why is this a problem? As given in the example above, it isn’t possible for a machine to ‘imagine’ something outside of its training, as Andreas explained ‘there are problems with overfitting in that the network doesn’t know what it doesn’t know’. If the model contains determinism you aren’t able to generate probable and uncertain data sets, for example as a human ‘I can understand classified people, which gives me the possibility to imagine new people’ - if there’s determinism in the model then this becomes a problem!
What does this mean, and why is it important?
If a system is able to imagine scenarios and create predictions accurately, outcomes can be estimated with more likelihood. For example, knowing how certain your system is of a certain decision it would affect it’s ultimate decision, for example in an autonomous car if the system can tell me that ‘the road is clear with 99% certainty, it would make a very different decision than if it was only 50% certain.' Andreas spoke about these systems, saying that a system that knows what it doesn’t know is able to collect the correct data when it’s uncertain - if you incorporate uncertainty into your model, it will know where it needs to collect more data.
But how can you introduce uncertainty? Amazon are currently using a probabilistic and Bayesian approach to ‘traditional’ deep learning approaches. This allows the introduction of uncertainty in three ways:
- Treating weights as distributions
- Stochasticity in the warping function
- Bayesian non-parametrics applied to the DNNs can achieve both of the above e.g. a Deep Gaussian process
The result? A Bayesian Neural Network: BNN
Model-wise, nothing needs to change here, but inference-wise that’s not easy to do - that’s why Bayesian models are more challenging.
Inference remains a challenge - it’s difficult to compute because of the nonlinear way in which it appears. To achieve Bayesian inference, weights need to be integrated out, giving a properly defined posterior on the parameters, allowing Amazon to compare across architectures. Andreas explained how the Bayesian treatment of neural network parameters is ‘an elegant way of avoiding overfitting and "heuristics" in optimization, while providing a solid mathematical grounding’. The family of deep Gaussian process approaches can be seen as non-parametric Bayesian neural networks. Introducing uncertainty with Bayesian non-parametrics is now added to the equation to overcome the problem of uncertainty about structure. The deep Gaussian process now brings in the prior on weights, the input/latent space is kernelized, and there is stochasticity in the warping - this is still a Bayesian model that integrates all unknowns.
Amazon want to be able to predict how something’s going to behave in the future. Take for example a model that can walk - how will it learn to run? From the conditioning of the walking in the previous step, the model is able to learn from the previous step.
In recurrent learning, the system is able to use its internal memory to process sequences of inputs, so in these cases uncertainty is very important, so by combining this with the Gaussian process you can input a prior generalisation about what the functions look like:
I managed to give my colleague a lazy eye even though he doesn’t have a lazy eye, but I managed to find the data to alter that’ - it’s the way of allowing data imagine something based on what it already knows. This is information you waste if you don’t take into account the uncertainty - you need to take a step and transform it once more, then transform it once more and it stops you losing information.
Andreas opened the floor to questions at the end of his presentation and was asked ‘how does approximation affect performance?’ to which he explained that because ‘we don’t use exact numerical methods we never know how close we are to the actual solution! We use approximations depending on assumptions - sometimes if you’re not careful it can be really bad using approximation, so it’s a trade off, sometimes it works better, sometimes not. Definitely more research is needed.’
Amazon aren’t the only online retailer leveraging deep learning to optimise their processes. At the Deep Learning in Retail & Advertising Summit in London next March 15 & 16 we will be joined by industry experts leveraging deep learning to improve the retail experience. Previous speakers include CEOs, CTOs, data scientists, machine learning experts and academics from the likes of Trivago, ASOS, Amazon, Zalando, Kohl’s, Gilt, Deepomatic and many more.