Q1: Is the Aria platform based on Graph neural nets or other forms of deep learning?
A1: No, it is not. In fact, Aria’s proprietary AI platform, SymphonyTM was specifically built to address some of the limitations of standard deep learning (DL) models, of which graph neural networks are an example. The three prime limitations of standard DL models that SymphonyTM addresses are:
The requirement for homogeneous or unimodal input data, this limits the possible understanding of disease biology; SymphonyTM is unique in its ability to analyze heterogeneous and unrelated data in their native formats.
The need for very large amounts of input data, which limits which diseases you can readily model. We’ve carefully ensured SymphonyTM can make confident predictions with realistic amounts of data; in fact, we estimate we can work in over 1,000 diseases as of today.
The lack of interpretable predictions, a particular concern for AI in biology tasks where interpretable predictions can impact downstream drug development. All predictions coming out of SymphonyTM are entirely interpretable allowing us to know exactly what pieces of evidence led to the identification of a hit.
Q2: Does Aria include a design of experiments module to guide experimentation that feeds into AI analytics?
A2: There are two aspects to which our technology could address experiments.
The first is adding input data for our disease-specific models. This is something we can do today because we can readily identify which data modalities are lacking for a given disease. That said, it is something Aria has strategically decided not to do. Because of the wide swath of diseases our technology can work on, it does not economically make sense to pay for generating more data in disease A when we could instead turn to disease B immediately.
The second aspect in which our technology can help guide experimentation is preclinical testing of the hits SymphonyTM identifies. We know that any preclinical model is, at best, a partial representation of the human disease and so there is always a risk that a hit that could be efficacious in treating human disease erroneously fails a preclinical model. To address that, our team can first examine the interpretable predictions coming fromSymphonyTM to understand how that hits’ mechanism will be disease modifying and compare that to how the preclinical model represents the disease to ensure there is overlap. In more recent work, we have also experimented with repurposing our AI technology to build an in silico model of an animal model. We can then compare and contrast the in silico model we built for the human disease to the one built for the animal model and determine where there is overlap or not. This can be very helpful in selecting the best animal model, specifically for our hits.
Q3: What prevents you from finishing all 18 pipelines in 12 weeks?
A3: First a clarification on where we typically get after 12 weeks or so. This encompasses all steps of our initial discovery process which results in the identification of which molecules to preclinically test. These are the steps where our AI technology replaces traditional approaches, and it typically takes a month to complete. After this we turn to more traditional methods starting with preclinical testing. In some instances, for example our work in NASH, an in vitro model is an appropriate first screening step. Regardless, however we eventually turn to in vivo models, and it’s the completion of those studies and our in silico work that typically takes us through 12 weeks. At that point, or after some targeted follow up experiments, we identify our lead molecule. We then turn to medicinal chemistry to optimize our initial hit and generate our own chemistry IP before heading toward an IND and ultimately the clinic. This latter process can be variable but takes on the scale of 1.5 to 2 years.
As a quick aside, if you were instead asking about why we don’t process all 18 (or even more) programs at once, that all comes down to economics. Currently, using a virtual pharma model and our computational discovery process, our core team of less than 18 people can actively work on 18 programs, including 3 progressing toward the clinic. Most companies of this size could only work on one, potentially two programs at most, with the same resources. In Aria’s case, we have advanced 3 programs into optimization.
Q4: Why not put weed out the compounds with safety concerns and poorly predicted ADME as the first step? In other words, why screen 50 million compounds when that could be reduced to 30 million? (2 questions like this)
A4: Great question and a topic I didn’t have time to address in the webinar. Behind our innovative AI is a great deal of data engineering. One small aspect of that data engineering is the processing pipeline we put all our molecules through. A full description is out of scope for here but suffice to say that we do in fact weed out compounds before building our disease-specific models. As of today, our compound library has over 2 million compounds, but with the data vendors we work with, we could easily pull in tens of millions of compounds. We instead triage those tens of millions of compounds down to the ~2 million which are sufficiently well characterized for our AI technology to make a confident prediction.
We could of course continue to triage further, but to maximize our potential for success, we want to maintain a diversity of chemical structures and mechanisms in our compound library. Doing so increases our chances to discover novel biology even if that discovery comes via a molecule that has, for example, insufficient ADME properties.
In the case where SymphonyTM identifies a hit that shows promising efficacy, but that hit has other limitations, we have built a discovery process to address these issues. This is dramatically aided by the fact that all predictions from our AI are entirely interpretable. What this means in practice, is that we can identify what is driving a molecules’ predicted efficacy and then determine if there are other molecules, even molecules outside of our compound library that replicate that mechanism.
Q5: You've mentioned that one of your unique strengths is using purely patient data, why do other people use data from animal models if enough patient data exists?
A5: This all comes down to our core technology strength: our ability to analyze completely unrelated multimodal data in their native formats. Even in the space of multimodal data analysis this is a unique ability. Other multimodal approaches require data to either share some relationship (e.g., all measurements come from the same patients), or the different data types are analyzed separately and the results are overlapped at the very end. Because we are looking at so many different types of data raw data simultaneously, any single type of data does not need to be as crisp or abundant as you would need if you were relying on that data in isolation. Put another way, to have success, other approaches often need disease models to generate sufficient quantities of clean, homogenous data. Aria’s solution instead has been to build proprietary technology to make use of reasonable amounts of heterogeneous data.
Q6: You mention you process heterogeneous data types in a way that is different than others because you look at multiple puzzle pieces simultaneously. Can you talk a little more about the impact of taking a multimodal approach? Maybe an example?
A6: We built SymphonyTM to analyze unrelated multimodal data because looking at so many different data types simultaneously provides us with the most thorough understanding of disease biology possible. In effect, our AI can identify connections between distinct types of unrelated data that allows us to then distinguish aspects of disease biology that you could not otherwise see. The impact of this is best shown with an example, the case of our backup hit in lupus, TXR-712. Across all the individual data sources and analysis methods we used, TXR-712 was ranked too low to identify it – it was indistinguishable from thousands of other molecules. However, when our technology brought all of those different data sources and methods together into a single ensemble model, TXR-712 rose right up to the top of our predictions. A result we’ve since confirmed by showing TXR-712 is significantly disease modifying in a preclinical model of SLE.
Q7: On slide 8 you estimate there are 1000+ diseases your platform can work in, are there any areas where you’ve seen more success or less success? Are there diseases you don’t work in?
A7: Our technology is largely disease agnostic. We need to have sufficient amounts of data – which we can measure before we even begin a program, and we need a well-defined disease. As a third requirement we’ve taken the strategic decision to work only in disease areas where a lack of disease understanding is the limiting step. This leaves many disease areas where we could work, but we find that our biggest advantage is in particularly complex diseases like autoimmune, inflammatory, or metabolic diseases where understanding all the nuances and details of disease biology is extremely challenging for the human mind.
Q8: You showed your platform could achieve 80% success at Phase 2 milestones, can you explain how you determined that value?
A8: At Aria we have always used our ability to rediscover previously investigated treatments as a measure of the quality of our disease-specific models. While we’re looking for novel treatments for a given disease, we know that if we see previously investigated treatments right alongside our potential hits, we have a predictive model.
More recently however we decided to start sharing those results as part of an extensive retrospective study. To do so we built disease-specific models of over 30 diseases and made carefully blinded predictions on more than 420 previously completed Phase 2 trials. These are the exact same models we would use to discover novel treatments, but rather than looking to find new treatments, we examined the molecules other companies had put into the clinic. Specifically, we looked to see what portion of the molecules that we predicted would be efficacious in their disease ended up transitioning to Phase 3. The answer was over 80%.
Q9: Aria currently only works on small molecules. Can your AI technology work with other drug modalities?
A9: Yes, the technology is agonistic to drug modality or where the starting compound library comes from. SymphonyTM could work with large molecules like biologics, we could examine a partners’ compound library, we could even make predictions on virtual libraries of never-before made compounds. We have decided to focus on small molecules for the time being as we see a large opportunity for orally bio-available treatments for the complex and hard to treat diseases we are pursuing.
Q10: You mentioned that your predictions are interpretable. Can you talk about why that is important in drug discovery?
A10: In the AI for biology space, I would argue that an interpretable prediction is mandatory for an optimal outcome. By knowing not just if a molecule is likely to be efficacious, but how it’s likely to be efficacious you can dramatically improve your downstream drug development. You’ll be able to select models that better reflect the biology your mechanism is modifying.
You’ll be better able to modify that molecule because you know which of its characteristics are important for its efficacy. And even in the clinic, by knowing how your molecule is disease modifying, you can identify biomarkers to help with patient selection, dramatically increasing your chances of success.
Q11: On slide 11 you talked about how in your discovery process you triage molecules really rapidly. Can you talk a little about how you're able to do that?
A11: This is primarily about the unique discovery process we’ve built to harness our AI technology. The way this works is that a research scientist uses our web-based user interface (UI) to build a disease-specific model. This largely includes targeted and AI- assisted data annotation. The concept is to have scientists make select key decisions and then allow software to perform the heavy lifting of analyzing and integrating data. This whole process usually takes one full time scientist 1-2 working days. The output is an efficacy score for every molecule in our compound library and the identification of the molecules most likely to be efficacious.
Next we turn to identifying molecules that are not just efficacious but are likely to be good drugs. From the top 2 to 3 thousand molecules and using additional AI on Symphony’sTM UI, a single scientist then filters out molecules that are non-novel, duplicative with other hits, or clearly not safe for the disease. This typically takes an hour or two and results in roughly 90 molecules that are then examined in a more manual diligence process. During that diligence process we further investigate safety and ADME properties, but primarily dig into how a molecule is likely to be disease modifying. This is only possible because our predictions are entirely interpretable. We can quantify which targets, algorithms, and input data led to a hit, and manually verify the veracity of every key data input. The output of this process is typically 10 unique and novel molecules for preclinical testing and is complete about 4 weeks after started a disease program.
Q12: Can you talk more about how your AI platform is put to use in practice? Are scientists able to interact with it or do you need to be able to write software to make use of technology?
A12: I talked a little about this in the preceding question, but we made the decision early on that we wanted drug discovery researchers using our AI in their hands. Because of that SymphonyTM is exclusively used via our web-based user interface, and for a researcher to create a new disease model we do not need to write any code. I like to say if you’re a drug discovery researcher who can make a slide deck, you can use SymphonyTM to build a disease model.
Q13: Can you provide some background or history on how the technology was built?
A13: We started on the technology that would become SymphonyTM when the company was founded in 2014. In the intervening 8 years, we have iteratively built SymphonyTM up in over 340 versions of our software. Briefly, our process is a weekly cycle of using data-driven analysis to identify areas for improvement, building solutions (e.g., adding a new data modality, improving an algorithm, or creating a plot to inspect output), and quickly getting those out to researchers to experiment with the new and improved platform, before beginning the cycle again.