Earth Day 2017: Tackling Climate Science With Deep Learning
Climate change is one of the most important problems facing humanity in the 21st century. By using simulations, scientists can provide us with a unique opportunity to understand the evolution of the climate system subject through various CO2 emission scenarios - thus giving scientists the opportunity to prepare for future problems, and find solutions earlier.
However, due to limitations in creating climate simulations such as the production of 100TB-sized spatio-temporal, multi-variate datasets, it can be difficult to conduct sophisticated analytics. At the 2017 Machine Intelligence Summit in San Francisco, Prabhat, Group Lead for Data & Analytics at Lawrence Berkeley National Lab, shared expertise on the ways in which deep learning can be used to tackle such problems in climate science, as well as exploring open research challenges for the future.
Looking ahead to Earth Day on 22 April, we caught up with Prabhat to learn more about how artificial intelligence and data analytics are transforming climate science and research.
What motivated you to begin your work in climate science?
I am computer scientist by training, and I have a broad grasp of several areas in Big Data (Statistics, Machine Learning, Management, Visualization, Workflows) and High Performance Computing. I began working towards a PhD in Climate Science because I wanted to bring all these skills together and apply them to the most important problems in society. Personally, I can’t think of a bigger challenge facing humanity than climate change. Our generation is slowly awakening to the challenge, but its really the next generation, our kids, who will face the consequences of our actions (or inactions).
What do you enjoy most about your current roles?
My role at NERSC is unique in that I get to track the frontier of science. I talk to scientists every day, and get to better understand and appreciate problems at all scales: from mapping all stars and galaxies in the universe to particle accelerators and sub-atomic physics. Climate science falls somewhere in between!
Simultaneously, I lead an incredible team of domain scientists and computer scientists to formulate and implement NERSC’s Big Data strategy. NERSC (LBNL’s supercomputing center) has been at the forefront of HPC for over 40 years. As science becomes more data-driven in nature, we are getting the right software and services in place so that scientists can continue to do groundbreaking work.
How are you using AI and machine learning in your work? What benefits does this have?
In climate science, we are applying supervised and semi-supervised learning to detect extreme weather events. Climate datasets are complex and large. A single climate simulation can produce over 100TB of data; some of the larger archives consist of over 5PB of data. It is impossible for humans to manually (and objectively) scan through such datasets for patterns. This is the perfect task for AI.
The benefits of this work are in that we are able to quantitatively assess how extreme weather will change in the future (under various carbon emission scenarios). We are able to answer important questions like: Are Category 4/5 storms more likely to make landfall in a warmer world? Will California receive less rain in the future? Do we expect to see more storms in the future?
Image: Climate simulations by Lawrence Berkeley Lab
Outside of artificial intelligence, which emerging technologies have had the biggest impact on climate science in the last few years?
I would say that High Performance Computing has transformed climate science by providing scientists with access to powerful computational capabilities. Systems like Edison and Cori at NERSC have been a big boon to the climate science community, enabling them to run a broad range of climate simulations to better understand and quantify sources of uncertainty.
Which emerging technologies do you think will have the biggest impact your field in the future?
It is clear that AI (and in particular Deep Learning) is poised to transform science. Recently, we wrote an article on various case studies at LBNL. We believe that as several domain science areas awaken to the paradigm shift and start applying these techniques, they will find radically better results (compared to hand-tuned heuristics). The article also cautions on a few challenges unique to science: domain scientists are not comfortable with the notion of a black box system, the AI community needs to develop better techniques to interpret the behavior of these networks.
Arguably, one of the better collective achievements of humanity over the past several millennia is the discovery of the fundamental laws of nature. We should try to find a way to constrain the solution space of these networks to subscribe to these laws. That will help in bridging the current, purely-data driven approach for AI/Deep Learning, with concepts which are broadly accepted in science.