The below blog is a transcript of James Cai, Head of Data Science at Roche Innovation Centre, presenting on the application of AI in the clinical development of drugs, a topic which is extremely prevalent in the current environment.  

AI is transforming many industries including healthcare and pharma. Where are the opportunities for AI in the early clinical development of new drugs, where scientific hypotheses first meet real patients in clinical trials? Can AI generate new insights to inform translational research or improve the efficiency of clinical trials? In this talk, I will highlight opportunities created by big data and AI, e.g., digital biomarkers for neurological diseases, and share my thoughts on what it will take to operationalize AI in drug development.

See the full video presentation & complete transcript below

Topics explored include:

  • Clinical Trials in Translational Research
  • Deep Learning in Neurological Diseases
  • Operationalizing AI
  • Can AI Help Us Predict Better Predictions?

A full transcript of James's presentation, including timestamps is hosted below:

[0:09]

Thank you for the introduction and I will be talking about applying AI in early clinical development of new drugs. So if you're working in pharma or clinical trials, clinical developments, you probably know many of these types of big data, but this is actually not true. 10 years ago if you think about it, genome sequencing was already available as a technique but it was still too expensive and not convenient enough to use in clinical trials. But today we are routinely using genome sequencing of patients in clinical trials to get further insights to find the biomarkers. Single-cell sequencing was not even available at that time and if you look at all other data types they were either not there at that time or the technology was there but not really in a way to use. So that includes, for example, real-world data such as EHR, insurance claims, digital imaging wearables or sensors, and clinical biomarkers. But today, all of these are available and so they are actually being powered by deep learning or other AI techniques so we actually, in clinical development, can do a lot more than we used to be able to.

[01:27]

So I want to focus on the early clinical development of new drugs and this is where the early translational research happens and by early I mean the phase one and phase two clinical trials. This is where you assess the safety and early efficacy, the proof of concept, whether the drug is really going to work, before scaling that up in phase three. So it's a very unique phase where you actually test the hypotheses that you develop in the pre-clinical stage for the first time in the real patients. The success or failure of this translation really has an outsized impact on the overall success or failure of your drug and you can also imagine any technology that can improve the success rate will have a really large impact on drug development. So the natural question is, can AI improve this success rate of drug development? And we believe I think you probably all do think the answer is yes, and there are two ways this can happen. So what is impacting the science side of translational research? The other is the operational side. By science, it’s mainly impacting finding better biomarkers to develop more precise medicines.

[2:54]

What that is really about is the precision medicine really, what translational research is about, but we don't want to forget the operational side. Where the clinical trials can actually be hugely improved, as you know, clinical trials are very expensive and very efficient, anything we can do to make that more efficient, make the data more reliable, that can improve the translational research tremendously. So we have been doing AI for the past several years and where are we in this journey? And I would like to argue for Roche and maybe for other pharma as well. We have passed, definitely the first one, the first stage demonstrating the value and understanding limitations because in the early stage what we really wanted to know is, is AI really going to be able to deliver as the people say? There are a number of examples I can show here. And the second is, once we demonstrate the value, can we explore further opportunities? Can we find further potentials to really enlarge this impact? And I think as someone already alluded to earlier, that culture is really a barrier because in a pharmaceutical company we value molecules, that's what we patent, that's what we develop into clinical stage and markets and really eventually that's what helps patients. But I think that culture is slowly changing, that we want to value data as much as a molecule because this data can generate insights, it can make our molecule better, it can make our clinical trials more efficient and it can do a lot of innovative things that we haven't thought about before. And the last phase, I think, is we want to operationalize AI meaning here I mean, really embed AI in the whole operations, the science operations of pharma research and development and we are not there yet.

[5:00]

So for the rest of the talk, I'm going to just go through this and then provide some examples for each of the categories. In my first example, I want to talk about using deep learning in neurological diseases so this is one of our early success stories in demonstrating the power of deep learning. This one happened I think maybe four years ago and as you know Parkinson’s disease is a disease that really affects celebrities but can also affect any of us sitting here. Unfortunately, the way to diagnose and to monitor patient symptoms, if you're in a clinical trial, is very old and crude methods. This is a UDPR testing, basically, you go to a doctor's office and there are a series of tests, behavioural tests or movement test that you go through. So there are two problems with this one is, it could be very subjective because the different physician may come up with a different answer. It could be affected by the patient's mood that day or something else that day. And second is that we don't really know what happens between each visit to the clinic. So in the bottom right, I think  the path here is a representation of a patient's journey in a whole year so 365 paths there represents one year. And you can see here, the two visits in this is a phase one trial that we ran for about a year.

[6:47]

And so you have two visits and the problem is the symptoms really, the red dots represent the severity of the symptoms of the patient over a year but the patient can really only recall maybe a few days a week of what happened in between the two visits. So when the doctor asked how have your symptoms been since the last visit, you don't really remember what happens two weeks, three weeks, two months, three months, you know. And so you're missing a lot of information and secondly, as I mentioned, it's really not that accurate because there is no molecular biomarker that you can use. So if you look at a typical curve this is, maybe, the ideal situation that you could see the difference between placebo and the drug.

[7:35]

But in reality, you may see a very large arrow bar and you actually also don't know what's happening in between. So that doesn't help us in understanding how the drug is really working. So the question is, can we use any digital technology here? We called a digital biomarker basically using the sensor signal from the smartphones to answer that kind of question. So with the device, there are two tests. One is called the active test, which mimics the UDPR testing, so for example, the patient will walk over to balance and will do rest and tremor, those things are being measured. And the second part of the test is the passive monitoring where a patient will just put the cell phone in their pocket and wear that with them during the day and we stream the data and collect the data and then see what we can learn from that.

[8:28]

So this is what the data would look like. The cell phone actually generates a lot of data continuously and that's both gyration and acceleration in three dimensions. This is actually one of our colleagues in Basel, Switzerland and he didn't know he was going to be filmed when he was on that day when he was wearing shorts, this is actually the deep learning model. Our colleague, we were in New York so it’s developed, took a deep learning model from the public and he adapted that to the central data in our clinical trials and he used 50 hours of public sensor data to train his model and in the validation the held-out test, he scored very high accuracy  of 99%/ 98% accuracy, and when this was applied to our patients we were using the active test as the validation sets because in the active test we actually know whether the patient is sitting or if the patient is moving. So between standing and walking we were able to achieve 99 or 96% accuracy and in reality, we also know are predicting multi classes you know, is the patient lying, standing, sitting, walking or stairs and more.

[10:00]

And so this is some preliminary results in the phase one trial, so as you see you can see the difference purely based on the phone data, not based on any physician assessment, to tell the difference between healthy volunteers and Parkinson's disease patients. So both the patients spend less time walking and when they do stand they actually have a less stand to sit transition and they also invest less power in their turns. So these are things that you can actually gain from your model. And the patient's invest, they walk less, they invest less power in the walking and they walk slower. So these are things that were in our first example, but we were now in the phase to expand that in many different in neurological diseases, one example is schizophrenia.

[11:04]

He used the same model, trained on a different data so this is an active graphic watch with a watch data and then similarly it achieved very high accuracy. This is the correlation between the predicted and the actual results and to make a long story short, you could also see correlations between the activity ratio. And basically, this is a measure of how motivated the patients are in their life and also some negative symptoms such as a diminished expression. So the patient who is more active in their gesture, in the power of their gesture, they actually are doing better in terms of their clinical symptoms and this, as you can imagine, can eventually become a more objective way of measuring patients in clinic and in many ways in health and drug development. So moving on to another disease area and the different data types, you saw the digital biomarkers. We could also ask, can AI be used in traditional molecular biomarkers? For example, in oncology. In oncology, we have an abundance of molecular biomarkers, candidates, usually hundreds, if not thousands, in clinical trials, phase one, phase two, but validated markers are still very rare.

[12:27]

Especially immunotherapy where you don't know the mechanism of action that well. So this is Francesca, in our New York team, she actually took all the baseline lab results that includes both immune biomarkers and also the chemistry biomarkers and using the gradient boosting model, and then try to see which of the features are possible biomarkers that can predict patient response. For example, complete response, partial response, stable disease or progressive disease and here we are, since we actually didn't have that many patients, we couldn't really build a predictive model, but we used the machine learning approach to rank the features based on their importance. And here you can see there are just two examples of the features, that's one marker is actually a new marker. The other is actually a chemistry marker.

[13:23]

Actually, each one of them has some predictive power. But by combining them you could see that these are the ones that the patients respond to and these are the patients that don't respond. So by having two biomarkers you actually can have a much better prediction. So you could also ask, are they really just some random feature that you select? So we actually use our reward data, the electronic health record from flatiron that we use to see, can we validate some of these. So this is a result of one of the lab chemistry marker that we could actually see that patients with a lower set this chemistry marker activity have a higher survival chance. And this we actually find in quite a few of the markers that we found. Basically, patients who are healthier, in various ways, they actually do better, not only in that they live longer, they actually respond to the drug better. And these could be very useful in predicting which patient might respond to certain drugs.

[14:28]

And I mentioned the operation side of the translational research, I won't go into too much detail here but I think the key point here is that if we look at one example, patient recruitment, this is really the most expensive part of running clinical trials, more than 30% of the total cost. But it's very in-efficient because less than 10% of patients as you see, really complete clinical trials. There are multiple points where patients, they either disqualify the dropout or  just leave the clinical trials. So as you can imagine, if we can improve the patient recruitment, having have the right patients at the right clinical sites, then our data can become a lot more reliable, our science can become so much better.

[15:22]

And we actually explored the real-world data to improve patient recruitment that actually worked. My time is up but also looking at, can we use AI to help us predict much better predictions? So I'm going through the next few slides very quickly, I mentioned culture here. As you know that in pharma, we want to introduce this data culture and one of the things that were done, was introducing the data challenges or hackathons that are very prevalent in the machine learning community but not so much of a practice in pharma. And so this is the first red challenge or the Roche analytic, advanced analytics data challenge. We used flatiron data with the goal of predicting patient survival after one year of treatment. And I just want to point to really 500 people from 28, Roche says, attended this with 132 teams, it is a really great participation. And this is another example that our HR is also thinking about, how to recruit new kinds of talents. So they put the code for life challenge for the community and also this type of thing that could be used in the recruitment of talent.

[16:50]

This is my last slide and basically, I think we are still not there in operationalizing AI and there are two major things, we still like data, we still like a well labelled and well-curated data. A second is we don't even have the infrastructure to really make data fair. So by that we mean can data be findable, accessible, interoperable or reusable? Without that, it's actually very difficult to do large scale AI, so that's my ends. I think I have demonstrated there is tremendous potential and also we are exploring different opportunities. And I hope that the new culture is taking shape and there's a lot of work that we still need to do in operationalizing AI. I will stop here and while you're thinking of questions I just want to acknowledge quite a few people there.


James Professional Bio

Dr. James Cai is the Head of Data Science at Roche Innovation Center New York, responsible for supporting drug discovery and development projects by leveraging big data and advanced analytics. Through exploration and mining of biomedical big data, e.g., those in genomics, electronic health records (EHRs), digital images, text documents and wearable devices, he and his team have made numerous discoveries that impacted drug projects at Roche via new insights and better decision making. James worked in the pharma industry for the past 18 years. He has been a bioinformatics scientist, software developer, business analyst, project manager, data scientist, and manager at Roche. James has a Ph.D. in Molecular Biology from Cornell University and a Master’s degree in Biomedical Informatics from Columbia University.


Interested in hearing more from our experts? you can see our previous experts blog series below:

Top AI Resources - Directory for Remote Learning
10 Must-Read AI Books in 2020
13 ‘Must-Read’ Papers from AI Experts
Top AI & Data Science Podcasts
30 Influential Women Advancing AI in 2019
‘Must-Read’ AI Papers Suggested by Experts - Pt 2
30 Influential AI Presentations from 2019
AI Across the World: Top 10 Cities in AI 2020
Female Pioneers in Computer Science You May Not Know
10 Must-Read AI Books in 2020 - Part 2
Top Women in AI 2020 - Texas Edition
2020 University/College Rankings - Computer Science, Engineering & Technology
How Netflix uses AI to Predict Your Next Series Binge - 2020
Top 5 Technical AI Presentation Videos from January 2020
20 Free AI Courses & eBooks
5 Applications of GANs - Video Presentations You Need To See
250+ Directory of Influential Women Advancing AI in 2020
The Isolation Insight - Top 50 AI Articles, Papers & Videos from Q1
Reinforcement Learning 101 - Experts Explain
The 5 Most in Demand Programming Languages in 2020