PDGrapher: AI Supports Researchers in Faster Drug Discovery
Months of laboratory work studying proteins, genes, chemical reactions, and disease processes in the search for new drugs could be accelerated with the help of a new model. PDGrapher, an artificial intelligence model developed by Harvard professors and partially funded by federal grants supported by taxpayers, is capable of identifying treatments that can reverse disease in cells. The model was presented in the study, “Combinatorial prediction of therapeutic perturbations using causally inspired neural network,” which was published this week in the journal Nature Biomedical Engineering.
Here is how
PDGrapher helps determine which genes should be “targeted” to make sick cells healthy again. It was tested on 11 types of cancer (e.g., lung, breast, prostate, brain). During testing, PDGrapher predicted the correct targets for over 30% of the tested samples better than other methods and was 25 times faster than similar tools. It also identified important genes associated with drugs, such as KDR and TOP2A in lung cancer, which have already been investigated in clinical studies. To clarify, this model runs on a computer using data that laboratories have already collected; it does not perform new experiments on its own.
During testing, two approaches were used: random split- the model tests the same types of cells it has already seen in part of the dataset, and leave-cell-out-the model tests a new type of cell it has never seen before, to see if it can predict results for completely unknown cells.
This new approach also means faster drug discovery, and as the scientists pointed out, this could also enable therapies for diseases that have long eluded traditional methods. The drug discovery process is neither easy nor short, but neither is it cheap. There is a need for research, clinical trials… if people need medicine, the process often takes a long time.
For example, last year in 2024, Bamberg Health hosted the inaugural edition of the Northeast US Healthcare Innovation Summit (USNEHIS) in Boston, Massachusetts. The goal of this gathering of healthcare leaders, technology innovators, and policy makers was to address pressing issues at the forefront of healthcare innovation. As Elad Sharon, a medical oncologist at the Dana-Farber Cancer Institute, noted during the 2024 Northeast US Healthcare Innovation Summit, “Clinical trials are extremely expensive, with costs per patient sometimes reaching a hundred thousand dollars.”
These data show how costly the process itself is, which is why all these innovations under the guidance of scientists can be of great significance.
The study’s senior author, Marinka Zitnik, an associate professor of biomedical informatics at Harvard Medical School, answered a few questions for Unknown Focus.
“While we cannot reduce the regulatory timeline directly, the tool makes the upstream discovery pipeline far more efficient and less costly.”
Even when new drugs are developed, getting them to the patients who need them can take a long time. Since the model is faster, does that still make a difference even if the process of reaching patients is slower? Is it still faster than the standard testing methods?
Professor Marinka Zitnik: Yes, speed matters even if clinical translation takes time. The discovery stage, where scientists identify which genes or combinations of them to test, is often the bottleneck in therapeutic development. Laboratory testing of candidate targets can take months to years. PDGrapher replaces large parts of that trial-and-error process with computations that run in hours to days, which is at least an order of magnitude faster than experimental screens. By narrowing down to the most promising gene targets and combinations, PDGrapher helps researchers prioritize experiments earlier, shortening the path to actionable insights. While we cannot reduce the regulatory timeline directly, the tool makes the upstream discovery pipeline far more efficient and less costly.
“This opens possibilities for drug repurposing in rare diseases, where existing drugs might be matched to underexplored disease mechanisms.”
How could this help in treating rare diseases? Can it be used for something beyond drug discovery, like additional research? It seems to me that there are many possibilities.
Professor Marinka Zitnik: Rare diseases often lack sufficient funding and patient samples for large-scale experimental screens. PDGrapher can work directly on datasets already collected, such as transcriptomic or proteomic profiles, and nominate candidate genes to target with existing drugs without requiring new experiments upfront. This opens possibilities for drug repurposing in rare diseases, where existing drugs might be matched to underexplored disease mechanisms. Beyond drug discovery, PDGrapher can be used as a research tool: it can highlight causal gene interactions in diseased cells, generate hypotheses about disease biology, and suggest combinatorial perturbations that would be infeasible to exhaustively test in the lab.
What’s next? Could this serve as a foundation or a source of ideas for determining future research directions? Is PDGrapher freely available?
Professor Marinka Zitnik: Yes, PDGrapher is openly available to the research community. We view it as a foundation for drug target discovery studies. Our next steps are twofold: first, to integrate PDGrapher with additional molecular data types, including patient-derived datasets, to improve its generalizability. Second, to work with experimental labs to validate predictions made by PDGrapher in disease models. We also see it as a starting point for developing future AI systems that can reason about interventions across multiple scales, from genes and proteins up to tissues and patients. By making PDGrapher accessible, we hope it will inspire researchers to use it for target discovery and also as a hypothesis-generation engine for new directions in disease biology.