Revolutionising Drug Discovery with Big Data and AI

In the second of the Pharma Integrates Insights series, As terabytes of healthcare data are generated that are more easily accessible to a broader range of stakeholders, Marco Mohwinckel examines whether a new age of AI-driven drug discovery is just around the corner

According to Jackie Hunter, PhD, Board Director of Benevolent AI:

“Around 90% of the world’s information has been created in the past few years. If the top pharmaceutical companies don’t embrace that fact, they might find themselves in a very different position a decade from now.”

Her company is one of the leading players in the AI-driven drug discovery and development space. And, like many bioscience companies, Benevolent AI believes that the key to developing new therapeutics may be hidden in vast swathes of data.

“During the past few years, we’ve built a capability that allows us to mine hundreds of millions of documents and databases to create the world’s biggest biomedical knowledge graph of known relationships,” Hunter explains. This database drives Benevolent AI’s efforts to identify new targets to treat diseases and develop more personalised — thus more effective — treatments. Artificial intelligence (AI), needless to say, sits at the core of this discovery process. Data scientists test hypotheses using AI-driven generative chemistry models to predict the properties of compounds and their activity on novel targets. What’s more, the company is going beyond discovery as it builds up both its own pipeline of novel treatments and drug development capabilities.

The power of AI lies in its potential to “unlock human biology.” If we think that in the whole history of the pharmaceutical industry, we have only been able to identify a few hundred targets and produce a couple of thousand drugs, it’s sobering to realise that more than half of the 10,0000-plus known diseases are still untreated. And that’s just the known unknowns. “It’s a bit like exploring relationships that should be understood,” says Hunter, “that should exist on the basis of the known evidence, but haven’t yet been described. Not to mention the relationships that are unknown, the so-called unknown-unknowns.”

Needles and Haystacks

An increasing amount of healthcare data is generated every day and there is no doubt that AI-driven tools can process information much faster, cheaper and more accurately than any human can ever do. “We’ve now got the ability to collect and store huge amounts of medical and genomic data. And we’re now able to use AI to link these datasets, which our relatively basic human brains are just not capable of,” says Dr Steve Arlington, President of the Pistoia Alliance, a global, not-for-profit consortium that works to lower barriers to innovation in life sciences R&D.

There is value in data and “everyone’s definitely waking up to it; everyone’s mindful of data being the new oil,” states Raymond Barlow, CEO at E-therapeutics. Barlow’s Oxford-based bioscience company specialises in combining its drug discovery platform with a unique approach to network biology. At the same time, many experts agree that quantity does not necessarily equal quality. “AI is all about giving us new insights,” says Nick Lynch, a founding member of the Pistoia Alliance, but he also adds that: “AI will only ever flourish when it’s had enough breadth and diversity of data to learn from … because the models will only be as good as the data they’re trained on.”

Guess what? It turns out that AI depends on human intelligence and hard-to-codify medical knowledge more than ever. “I think people consistently fail to recognise the effort that goes into curating data,” says Jackie Hunter. “It’s a phenomenal task and one that can only ever succeed when the AI is fed the proper data, whether that’s proteomic, genetic and/or biomedical. That’s why, when we’re working with unstructured data, we bring it in and convert it to our own document format and perform a certain amount of cleaning. It’s really important because a lot of data isn’t good quality.”

Perhaps, more importantly, unless we are very clear about what we are looking for, we’ll never find that famous needle in the haystack. Oliver Harrison, CEO of Alpha Health, Telefonica’s Moonshot company, who has been living and breathing Big Data and AI for years, puts it this way: “The starting point should be identifying the question you’re trying to answer. Then, you need to assess what the minimum viable data set would be to give you an answer to that question.”

John Wise, Senior Consultant at the Pistoia Alliance, seems to agree: “One of the biggest tensions I still see with bringing artificial intelligence into both discovery and clinical development is the inability of the business to appropriately frame the problem they’re trying to solve.” Not forgetting the so-called algorithmic bias, warns Oliver Harrison: “If you only focus on white male data from the middle classes, then you can’t generalise to other ethnic groups or different genders.” Perhaps we need smart thinking and smart data as much, if not more, than just Big Data?

Data Shared is a Problem halved?

“So, let’s say I’m using AI to help me. What am I going to do with regards to the datasets? We’ve found, without doubt, that our tools work best when a company brings its own data into our database,” observes Andrew Fried, IBM’s Global Life Science Industry Leader. The company’s Watson-derived drug discovery tool is a cloud-based solution that analyses data to discover new drugs and deliver other scientific breakthroughs. “There really is this crossroads around data”, says Fried: “A lot of the AI-tools, like ours, work on public data … but the greatest value and biggest challenge is when we combine public and proprietary datasets.”

For decades, pharmaceutical companies have shrouded their R&D in mystery and kept their information to themselves; but, a new age of public–private and intercompany sharing may be upon the industry. After all, pieces of clinical and chemical data are almost useless in isolation. Context and quality are everything. It’s when companies collaborate and share their datasets that new and valuable insights are generated.

In fact, in a recent survey with almost 400 lifescience leaders, the Pistoia Alliance found “technical expertise” as the main barrier to adoption of AI, ML/NLP and believes that collaboration between stakeholders on data standards, benchmark sets, and data access, will be essential for widespread adoption.

Industry has been long engaged in precompetitive collaborations such as the Innovative Medicine Initiative (IMI), but titans are indeed opening up to more daring types of partnerships. For example, in 2014, J&J announced a collaboration with Yale on an Open Data Access Project (YODA), granting an independent academic group full decision rights on the release of clinical trial data.

In 2016, AstraZeneca announced a 10-year deal with the genomics technology company Human Longevity to sequence and analyse patient samples from clinical trials; and in 2017, GSK formed a consortium with Livermore National Laboratory, the Frederick National Laboratory for Cancer Research and the University of California San Francisco to combine the group’s vast databases to improve the chances of discovering new cancer therapeutics. They recently announced an investment of $300 million in 23andMe for genetic drug discovery. 

“Inevitably, I think a lot of the innovation is going to be outside of those big companies and will be in smaller companies like ours,” says Raymond Barlow. But Jackie Hunter cautions that “adoption patterns vary in different companies. But, once they see others making progress, things will speed up. The companies that don’t adopt in 5 years will be at a real disadvantage.” Big Pharma is indeed turning to more agile and innovative early stage AI-driven drug discovery companies for answers. Companies such as Benevolent AI, BERG, Exscentia, Deep 6AI, e-Therapeutics, Atomwise and Freenome are some of the more sought-after partners.

It’s abundantly clear that the future doesn’t belong to one single company. Just how much Big Pharma should be investing in building in-house capabilities required to fully capitalise on AI, and where such capabilities should reside organisationally, is still a matter of debate. Similarly, the jury is out regarding whether the pharmaceutical industry can benefit from more aggressive collaborations with non-traditional healthcare companies. For now, though, let’s just say that the drugs of tomorrow may very well be those discovered by an algorithm mining collective datasets made available through collaborations with other pharmaceutical organisations.

Marco Mohwinckel

Pharma | Digital Health | Health Tech | Strategy & Innovation

Marco Mohwinckel
NEXT ON OUR INSIGHTS SERIES