We live in a century where access to information is ubiquitous and change is the only constant. Artificial intelligence, once a futuristic concept, is now transforming industries globally. The development of cognitive computing applications with machine learning and deep learning techniques has paved the way for researchers to make major breakthroughs in diverse fields ranging from speech recognition, language translation, and image processing to economics and computational biology.
In particular, bioinformatics—the science of analyzing complex biological data using computational methods—has seen significant advancements due to the rise of artificial intelligence. Recent research has enabled computers to analyze polymerase chain reactions and classify digital images. Combined with the need to store and analyze digitized clinical images, these techniques present significant opportunities for medical information systems. In addition, the decoding of the human genome and major advances in molecular biology continue to present significant opportunities in bioinformatics and computational biology.
Join RE•WORK at the Deep Learning in Healthcare Summit in Boston this May 23 - 24 and hear from global experts covering topics such as Neural Networks, Machine Learning, Pattern Recognition, Image Retrieval and many more. Passes are currently available at Early Bird discounted prices, but this offer will expire at midnight on the 5th April. Confirm your place today and save $300 on Passes!
Recent advancements in artificial intelligence and computer vision can be applied to yield numerous medical benefits. Faster image analysis has the potential for use in data-driven models for initial screening tests or future aids in assisting pathologists with early diagnosis of cancer. The development of large-scale convolutional neural networks (CNNs) that can fit on mobile devices can significantly assist in low-cost, real-time disease detection, especially in rural areas where current testing is not available. Combined with the increase in computational power and technology accessibility, advances in deep learning are becoming incredibly powerful tools for social good.
In stark contrast with such significant progress in data-driven bioinformatics, potential cures for cancer have remained elusive. The status quo standard of care consists of broadly toxic chemotherapy which leads to high patient survival rates for some cancers and low rates for others. Currently, researchers are looking into cancer therapies less harmful than chemotherapy or radiation. In particular, research has focused on developing precision medicine approaches for discovering druggable weaknesses in cancer cells, with immunotherapy emerging as a leading contender. Immunotherapy artificially stimulates the immune system to treat cancer and is not harmful to the body. Although the earliest observations of an immunological antitumor response were made long ago, it was only in the past 30 years that immunotherapy emerged as a viable therapeutic option. Recent discoveries of important interactions between the patient's immune system and cancer cells may predict resistance to therapies as well as opportunities for immunotherapies.
One of the main challenges stymieing the advancement of precision medicine is the complexity of cancer biology. Genome-wide approaches generate lists of hundreds to thousands of genes, so identifying predictive signatures can be incredibly difficult. One strategy for overcoming this challenge is to incorporate prior knowledge obtained from biological experiments – my research focused on this task.
My research was conducted at UC Santa Cruz's Baskin Engineering School under the mentorship of Mr. Jacob Pfeil, a doctoral researcher at the UC Santa Cruz Genomics Institute. In particular, we worked to predict how a patient would respond to an immunotherapy based on their gene expressions and molecular features. Given an arbitrary patient population, we clustered patients into different groups based on their gene expression, therefore enabling us to better predict likelihood of response to drugs or therapies. To test our algorithm, we used the Anti-PD1 cancer therapy, which was developed by researchers at Johns Hopkins Kimmel Cancer Center. The therapy does not aim to kill cancer cells directly, but instead aims to block a pathway that shields tumor cells from immune system components able to fight cancer.
In order to develop our approach, we used an adult cancer dataset from a TCGA (Cancer Genome Atlas) study that has had previous success with Anti-PD1 immunotherapy. TCGA is a large database with multi-dimensional maps of key genomic changes in over 30 different types of cancer. In sum, TCGA contains 2.5 petabytes of data from over 11,000 patients. We further used two different gene sets, Lyons and Thorssons, to cluster the patients. We curated the available gene expression signatures for profiling immune cells using published data and annotated this list of genes with biological functions from two different databases: GO (Gene Ontology Database), and MSigDB (Molecular Signature Database). We entered the genes into the database in order to find commonalities within the biological processes they were involved in. To find commonalities, we looked at the ratio of the number of the genes that were in our dataset to the number of total genes.
We developed and used a machine learning based model, Hydra, to cluster our patient samples into different groups based on similar expression across their genes. Hydra sampled many patients in order to identify multi-modal genes which were necessary to determine differential gene expression. For example, if a gene was upregulated in a patient with cancer and downregulated in a healthy patient, the gene could be a factor in causing the cancer. Hydra takes in a patient population, and in an unsupervised way, identifies different components and looks for correlations across genes that share a similar kind of pattern.
Hydra is useful because there currently is no standard way to do differential gene analysis, and Hydra also has the benefit of learning the number of clusters using a variational Bayes approach. It involves continuous looping over the patient population in order to learn the number of clusters in the most discriminating manner. Hydra groups the patients in a way that identifies a coordinated expression of genes across the cluster while converging the different clusters into the smallest number of accurate clusters.
We found that our results did not support the prevailing hypothesis in the field as patients in clusters with downregulated genes responded best to Anti-PD1 therapy. Researchers believe that the presence of immune cells near a tumor shows an increased likelihood of response to a targeted drug or therapy. However, our research has unearthed evidence that contradicts the general consensus for Anti-PD1 therapy. If corroborated, our work could open up the field to a vast number of plausible immunotherapies for curing cancer.
Today, recognizing the potential of computer science to solve societal issues, combining Biology and Genomics with advances in artificial intelligence and machine learning may help us solve problems that could improve our quality of life. With patients’ consent, it may be possible to collect vast amounts of data from electronic health records, genomic profiles, and wearable devices. Analyzing this data may allow for the identification of noteworthy patterns and corresponding notifications and alerts where necessary. Technological advancements have made deciphering and conquering the genomic atlas a reality. With our research, we are one step closer to developing new technologies including data-driven applications that will change the way patients are diagnosed and diseases are prevented.