Deep Learning & Cognition - A Keynote from Yoshua Bengio

The RE•WORK Deep Learning Summit & Responsible AI Summits were brought to a close on day one with an hour-long keynote from one of the world’s leading experts and pioneers in Deep Learning, Yoshua Bengio. We were delighted to have Yoshua join us again this year in Canada to discuss his current work, referencing both the latest technological breakthroughs and business use application methods discovered in Deep Learning over the last twelve months.

Yoshua’s opening remarks proclaimed that there are principles giving rise to intelligence, both machine or animal, which can be described using the laws of physics. That is, that our intelligence is not gained through a big bag of tricks, but rather the use of mechanisms used to specifically acquire knowledge. Similar to the laws of physics, we should consider understanding the physical world, mostly by having figured out the laws of physics, not just by describing its consequences.

“We have made more progress than my friends and I expected a few years ago. It is mostly about perception. Some things in natural processing are doing well, but we are far from human capabilities”

Yoshua further simplified this by explaining that we can draw inspiration for AI from living intelligence, suggesting that curriculum learning, cultural evolution, lateral connections, attention, distributed representations and more are all methods which are commonly used, maybe without intention, in everyday life which can then be further applied to the development of future AI algorithms. Professor Bengio went on to suggest that whilst Deep Learning has seen huge advancements this century with computer vision, speech recognition & synthesis, natural language processing and more seeing huge leaps in capability, we are incredibly far from human-level AI, citing sample complexity, human-provisioned labelled data & adversarial example errors as some of the key errors currently seen. Yoshua further suggested that the talk of ‘what’s next’ is far-wide of the mark:

“With all the improvements we have seen in recent years, some people think we are done and just need to scale up what we have already learnt to wider topics and problems, however, I think there are many pieces in the puzzle which are missing and I want to brainstorm with you on this”.

Interestingly, Yoshua used, many times, the example of young children or babies as something which the next generation of AI can be modelled on. The ability for humans to generalise allows us to have a more powerful understanding of the world than machines currently do. That is, that we use training data, which to us is not training data. Imagine if you will, that we can understand stories which are fictional, in fact, many are able to finish stories I start to tell purely because, even if it is nonsensical, humans have no problem with imagining impossible things. In regard to the next steps for AI, it is simply not good enough to grow data sets, model sizes and computer speeds without applying this information.

How can we close that gap to human-level AI? Yoshua suggested that the following are currently missing and would be necessary to make that next step:

The ability to generalize faster from fewer examples
The ability to generalize out-of-distribution, better transfer learning, domain adaptation, reduce catastrophic forgetting in continual learning
Higher-level cognition: system 1 vs system 2
Additional compositionality from reasoning & consciousness
Discovery of causal structure and the potential to exploit it
Human-level exploitation of agents with perspective from RL, unsupervised exploration

The answer for a majority of the above stated factors? Learning multiple levels of abstraction, for high-level abstractions would disentangle the factors of variation, allowing for easier generalization, transfer of learning reasoning and language understanding as these factors are composed to form observed data. The discovery of said disentangled representations is easier said than done, with spatial and temporal scales alongside marginal independence, simple dependency between factors and more needed.

Yoshua also commented on the two systems for cognitive processing, citing Daniel Kahneman’s book ‘Thinking Fast and Slow’ with the use of ‘System 1 and System 2’ with the former encompassing intuitive, fast and automatic perception and the latter harnessing rational but sequential, slow and logistical decision making formats. There is no magic in consciousness, you see. Whilst it is true that brains are incredibly complex and somewhat stochastic machines, the idea of consciousness can be associated with various computational mechanisms. Bengio continued to suggest that the three computational aspects of consciousness are that of access consciousness, self-consciousness and qualia (subjective perception). Yoshua then showcased this idea in the form of an anecdote to further break it down:

“When you drive to your home you don’t need to think about what you’re doing, that’s system one - you can talk while driving and perform other tasks if you are driving in a place you recognise on your way home, it is completely automatic. System two is panic and not cognitively knowing where you should be going - all of this encompassing conscious thinking and imagining what could go wrong, that is system two.”

It was further explained that humans combine systems one and two on many occasions as we have the ability to sequentially focus on different aspects and attest to most things we have in our mind at the moment. In fact, it could be that we are over complicating that which we think machines should understand, this is sometimes seen as although we think about objects and high-level entities in the world and not necessarily about something's shape, colour or texture, more so how we interact with it, we expect machines to have a different level of affordances, which we ourselves do not.

Something which Yoshua credited as the future of unlocking Deep Learning was the concept of attention. Bengio cited that this concept is going to unlock the ability to transform DL to high level human intelligence allowing for your consciousness to focus and highlight one thing at a time. Brains tore their own side memory which we are not conscious of, but have the ability to play back. Again, Bengio used an anecdote to demonstrate his point:

“Brains store their own history in a side memory and have the ability to play it back. With that, we are able to recall information relevant to our current behaviour and feelings. Instead of playing back your whole day in order to figure out how you thought twelve hours ago, you can just remember pinpointed memories. Say you’re driving and you hear a pop sound, even if you notice the sound but take no action, it goes into your memory and you compute through the options of what it could be. When you stop later and see you have a flat tyre, you can recall back to the moment you thought something happened. When this happens, you can skip through hours of driving, you don’t need to see it all, you just jump through time to remember the exact moment that you heard that pop - you associate it with the image of the flat tyre you’re seeing”

Yoshua further suggested that the study of consciousness in neuroscience should be mirrored in Machine Learning.

“Consciousness is a very loaded word. Neuroscientists are starting to put some science behind it and it’s time ML does as well. It’s about different functionalities in the brain, in particular, the ability we have to sequentially focus on different aspects and attend most things we have in our mind at the moment, they become dominant”.

In the latter part of his presentation, Yoshua discussed the facets of Machine Learning currently missing to be progressive, including the need for generalisation and understanding beyond mere training distribution. Current ML models face the critique of poor reuse & poor modularization of knowledge as learning theory only deals with generalization within the same distribution whilst not generalizing well full stop.

“As with some of the things I have been saying, it is not enough to train our model on one task. We need to build models which understand our world at a higher level, designing algorithms that understand the world's environment. Once a machine can understand the causal structure of the world and produce plans to take advantage of the fact that they are not passive in the environment and can be active and acquire knowledge, we will be getting to a better standard of application. Machines need to get to the point that they recognise that they can do things to purposely gain knowledge and use these for leverage.”

Following topics of note included Recurrent independent mechanisms, sample complexity, end-to-end adaptation, multivariate categorical MLP conditionals and more. When summarising his talk, Professor Bengio gave three key points to keep in mind when ‘looking forward’

We must build a world model which meta-learns causal effects in abstract space of causal variables. This requires a necessity to quickly adapt to change and generalize out-of-distribution by sparsely recombining modules
The necessity to acquire knowledge and encourage exploratory behaviour
The need to bridge the gap between the aforementioned system 1 and system 2 ways of thinking, with old neural networks and consciousness reasoning taken into account

Speaker profile:

Yoshua Bengio is recognized as one of the world’s leading experts in artificial intelligence (AI) and a pioneer in deep learning. Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. Holder of the Canada Research Chair in Statistical Learning Algorithms, he is also the founder and scientific director of Mila, the Quebec Institute of Artificial Intelligence, which is the world’s largest university-based research group in deep learning. His research contributions have been undeniable. In 2018, Yoshua Bengio collected the largest number of new citations in the world for a computer scientist thanks to his many publications. The following year, he earned the prestigious Killam Prize in computer science from the Canada Council for the Arts and was co-winner of the A.M. Turing Prize, which he received jointly with Geoffrey Hinton and Yann LeCun. Concerned about the social impact of AI, he actively contributed to the development of the Montreal Declaration for the Responsible Development of Artificial Intelligence.

Join us at the next edition of the global Deep Learning Summit Series: www.re-work.co/events

In the meantime, view technical presentations and interviews on our extensive AI Video Library here or watch via our YouTube channel below.