Should We Be Rethinking Unsupervised Learning?
Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples.
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.
Most practical deep learning applications have so far been driven by supervised learning and very large labeled datasets. It's understood that larger datasets and more compute power are likely to drive a lot of future progress, however gathering this data is costly and difficult, and humans are proof that there is a way to develop intelligent behaviour without any large, labelled datasets. We learnt more about use cases of unlabelled data from Roland Memisevic, Assistant Professor at the University of Montreal, and Chief Scientist at Twenty Billion Neurons, at the Deep Learning Summit in San Francisco, where he explored ideas on rethinking unsupervised learning, which he felt may explain what scientists have been doing wrong. I asked him a few questions to learn more about using unsupervised learning, research challenges and future advancements in deep learning. Tell us more about your research at the University of Montreal. I am a member of the MILA institute at the University of Montreal, which is a large institute specialized in deep learning. I am one of four faculty members, and there are around 50 students and researchers. It is a very vibrant and active community and there are many different projects going on. My own recent focus has been on building networks that solve computer vision tasks beyond object recognition, building hardware-friendlier neural networks, and improving the training of networks (for example, by orthogonalizing weights and avoiding gradient vanishing problems). Can you expand on the idea that we need to rethink unsupervised learning? At ICLR 2015 last spring I was chatting with Ilya Sutskever about the apparent "failure" of unsupervised learning (at least by comparison to the amazing successes of supervised learning). He made the outrageous statement that maybe unsupervised learning, at least the way we have been pursuing it over the last years, may be completely misguided and wrong. He basically questioned the common practice of training networks to reconstruct their inputs, which is the most common way of doing unsupervised learning. I argued against that, but in hindsight I have to admit that he may be right. A common argument for unsupervised learning is that humans do not receive the large number of training examples that deep networks get, so they must have some other way of learning. But now I'm also no longer sure whether even this is the right way to state things. It is certainly true that a child gets told "this is a dog" by their mother at most a handful of times in their life. In that sense we get very few labels. But there is an ocean of other kinds of label that we get day-to-day: "this object breaks when falling", "this object can occlude another one", "this is how my visual input changes when I move my head", etc. On top of that many simple concepts like "and", "but", "or" and many linguistic categories could be viewed as labels, too. For example, an image that we are told to contain "a cat and a dog" allows us to learn about the concept "and" in much the same way in which it allows us to learn about the visual concept "dog". I do not see why we should treat these "labels" conceptually any different from the labels we have in a large supervised training set. Of course, for humans even simple labels like "dog" do not come as "labels" the way a neural net receives them, but they come in the form of an acoustic signal carrying language, which just happens to occur in close temporal proximity to the visual input. So to the degree that we move towards more complicated tasks, like generating language rather than one-of-k labels, and interacting with the world, perhaps using ideas from reinforcement learning, the unsupervised learning mystery may just disappear, and in the end we use something more akin to transfer learning. What are the key factors that have enabled recent advancements in deep learning? The use of the right kind of hardware. Period. The one and maybe the only important thing that deep learning is really good at is that it allows us to harness dense, parallel hardware. Parallelization is difficult and messy and it requires lots of engineering to get to work. As a result it never really worked well in the past. Deep learning now allows us to fully exploit parallel hardware because it learns how to deal with this mess by itself. This is why neural networks are based on matrix multiplications. And it is the reason why deep learning took off only around 2010 when people started using GPUs. Before that, all of us did something incredibly silly in hindsight: We simulated a system whose main strength is its ability to harness parallelization on sequential hardware! And everyone was surprised when it didn't work that well. Where are there still challenges to progressing research in deep learning? The biggest challenge in my opinion will be in developing hardware that is more suitable for deep learning. Although GPUs are a huge leap forward, I think that there is an incredible potential in driving things much further on the hardware side. Even the biggest neural networks are tiny compared to biological brains. And they consume much more energy per operation. One reason for this is that brains have a completely different design: There is no finicky synchronization, there is no complex long-range communication, there are no floating point numbers. If the community will figure out how to implement back-prop on such hardware (and there exist some early efforts in that direction) that would be a very big deal. Of course, there are many other interesting research directions (like getting beyond supervised learning, DL for robotics, etc.) but these aren't really "challenges" because they are just bound to happen and things will improve over time. Progress in all of those areas will be influenced, though, on how well we will be able to scale things up on the hardware side. What advancements excite you most in the field? The most striking change in my mind is the current transition to "Von Neumann-like" computation in neural networks. Now that recurrent networks work, people started to experiment with memory architectures, complex data structures, certain kinds of reasoning, etc., within a neural network. Deep learning is moving towards what one may call "neural programs": sequential, program-like computations run on a neural network substrate. There is a certain irony in this. In the past we simulated inherently parallel computations (neural networks) on traditional, sequential hardware, and it did not work. Today, we simulate sequential computations and classical computer science concepts on parallel hardware, and it actually seems to work pretty well! One way to interpret this finding is by saying that computer science may have taken the wrong turn in the middle of the last century by focusing too much on these now classical computer science concepts (like Turing Machines) instead of viewing them as merely a useful add-on to a completely different compute paradigm, which is based parallel computation and learning. Another reason why the current transition to "neural programs" is interesting is that it completely shifts our perspective onto many practical tasks, and it may lead to much better solutions. Many tasks can be solved best by operating on a workspace where you create intermediate solutions that you incrementally revise, etc. Most existing neural network solutions do not do this yet, and research into "neural programs" may change this. As an example consider the task of generating text (for example within a machine translation task). Currently, most recurrent networks would generate text word by word until they hit an end-of-sentence symbol at which point they stop. I can very well imagine future recurrent networks generating text by iterating over the output-text multiple times, generating a few words at a time, then re-reading what they have generated so far before generating more words, etc. This is how humans generate written text. This is not how humans generate spoken text, of course, but most spoken text is in fact full of errors and on-the-fly revisions. Another example would be the generation of images, where incremental generation based on a workspace is already being pursued to some degree (with Deepmind's DRAW network, for example), but I think we can and will go much further than this in the near future.
Roland Memisevic will be speaking at the Deep Learning Summit in Montreal. Come and join us!