Deep Learning in Genomics: 5 Questions with Google Brain's Jasper Snoek

Google Brain are working to ‘make machines intelligent’ and to ‘improve people’s lives’ by tackling the most challenging problems in computer science. At the Deep Learning Summit in Montreal this October 10 & 11, we will hear from Jasper Snoek, who will be discussing his current work in Deep Learning on DNA at Google Brain, and how some recent breakthroughs in deep learning have allowed him to push forward with groundbreaking research in genomics. Also joining Jasper from is Hugo Larochelle who will be presenting on ‘Generalizing From Few Examples With Meta-Learning’.

We’ve been fortunate enough to chat with Jasper in the run up to the event, and he explained to us that as Research Scientist in the Brain team, he’s helping to form the interface between Google and the academic community. It’s an amazing position to be in as, essentially an academic researcher in machine learning as he has access to invaluable resources and people at Google. Most of Jasper’s time is dedicated to trying to solve challenging problems broadly in machine learning.

In advance of his presentation on Deep Learning on DNA, Jasper answered some of our questions surrounding his work and involvement with Google Brain.

Can you give me a short overview of your work at Google Brain?

I like to motivate my research with real applied problems, either within Google or in, for example, genomics. I also spend a lot of time in service to the academic community. In the past six months, I reviewed submitted papers for a variety of conferences and journals and served as an area chair for ICLR and NIPS.

What started your work in deep learning, and more specifically genomics?

In undergrad at the University of Toronto I ended up taking a class called "Introduction to Machine Learning and Neural Networks". The instructor, Geoffrey Hinton, had a way of inspiring students to become excited about the potential of machine learning and neural networks in particular (which were actually kind of dismissed in the field at the time). I originally never intended to go to grad school, but I was hooked. I started my research career applying machine learning to assistive technology, helping elderly and disabled people through developing non-collision wheelchairs, etc. However, there was an air of excitement in the department about the potential of machine learning and it was difficult to not get drawn in. I became really excited about connections between neural nets and particular statistical models and started writing papers on that.

Genomics didn't start until much later when I was a postdoctoral researcher at Harvard. A postdoc from the stem cell biology department, David Kelley, kept showing up to our machine learning group meetings. He impressed upon me the potential of deep learning to unlock some of the mysteries of how our DNA works. I helped David with the details of training deep convolutional networks and in turn he brought me up to speed on how our genes work.

Geoffrey Hinton will also be presenting at the Deep Learning Summit alongside Yann LeCun and Yoshua Bengio. The AI pioneers will be appearing on a panel together at the event.
Register now to guarantee your spot at the summit.

What are the key factors that have enabled recent advancements in genomics?

Really great science coupled with amazing advances in technology have allowed us to collect data at that we've never had access to before. This is true across the spectrum of molecular biology to genetics, and it allows us to study the statistical relationship between variation in our genomic DNA and an incredible variety of phenotypes at multiple scales from the RNA abundance levels transcribed off the DNA to the lifespan of the individual. Upcoming datasets are larger than anything studied before, and even add new dimensions to the data. For example, rather than measure the average gene expression profile of the cells in a tissue, we can now measure noisy single cell expression profiles for hundreds of thousands of cells. Exciting new data like that introduce new computational challenges with great potential to better understand human biology and improve health.

What are the key challenges you are currently facing in your work?

There are so many open questions to be answered and not enough hours in a day. A major challenge is just choosing which of the exciting problems to devote time to. Also, simply staying up to date in multiple rapidly advancing fields of research is a major challenge. New machine learning papers are popping up so rapidly that it's impossible to read all of them. You really have to learn to develop a strong filter. It's also worth noting that many papers simply don't work as well as advertised. This makes it challenging to decide which ideas to implement and build on for our DNA models. Our original code contains loads of commented out lines of ideas that simply didn't help or made results worse and each of those lines corresponds to multi-hour or multi-day experiments.

What developments can we expect to see in deep learning and AI in healthcare in general in the next 5 years?

A variety of different areas of machine learning are maturing to the point at which I expect we can see tremendous innovations in healthcare. First and foremost, recent advances in fairness, the prevention of discrimination and bias and differential privacy in machine learning help ensure that we can apply state of the art methods in an ethical way while maintaining privacy. This coupled with new techniques for merging disparate data sources and natural language processing will allow us to analyze medical records, discover symptoms and even predict medical outcomes.

I expect tremendous advances in medical image analysis in the next 5 years, assisting experts to significantly improve morbidity detection rates. Deep learning methods are already enabling analysis that outperforms experts in applications such as the detection of diabetic eye disease and cancerous tumors. I don’t see experts being replaced entirely, but instead being able to achieve higher throughput at higher accuracy with more time to focus on the hardest cases and with a strong second opinion to help minimize human error. I expect this to spread across medical domains to, for example, cardiology where algorithms could automatically analyze EKGs taken at home.

Finally, I hope that machine learning analysis of genomics, both at the individual and the population level will help us understand genetic mutation and what we can do to prevent genetic disease. I think there will be fantastic inroads here in the next 5 years driven by the combination of large amounts of data and more powerful machine learning methods.