Can combining machine learning with DNA-encoded libraries revolutionize drug discovery?

Learn how drug hunters are leveraging advanced algorithms and DNA barcoding tools to bring targeted medicines to patients sooner

27 Oct 2022
Dora Wells
Clinical Content Editor

Editorial article

Anagenex combines machine learning with biochemical tools for high-throughput analysis

Advances in computing power and the availability of large datasets are fueling a new era of drug discovery in which researchers can gain deep insights into millions of candidate compounds for a fraction of the cost of traditional screening methods. An emerging technique playing a critical role in this trend is the use of a DNA-encoded library (DEL). Using DELs, drug hunters can generate libraries containing billions of compounds, each traceable by a unique DNA tag, to use as a starting point for screening the best candidates against a specific drug target. Now, with the aid of revolutionary machine learning (ML) tools, this process is being further streamlined using algorithms to comb through these vast datasets, predict the best drug compounds, and even design downstream experiments to optimize their chances of clinical success.

One company making strides in this evolving field is Anagenex, where Senior Vice President, Joe Franklin, is leading the effort to harness the combined benefits of DEL technology and machine learning to identify new drug candidates faster and more efficiently than ever before. In this article, Franklin reveals how this approach is helping to solve long standing challenges in drug discovery, the importance of aligning screening tools with ML data needs, and what he sees for the future of computer-aided discovery. He also highlights the value of building strong relationships with suppliers of DEL components and shares why LGC Biosearch Technologies became Anagenex’s partner of choice in securing mission-critical oligonucleotides for its discovery platform.

Rapid screening at a lower cost

Anagenex is focused on tackling historically challenging drug targets, such as nucleic acid binding proteins, allosteric binding pockets, and proteins that require highly selective molecules to avoid toxicity. “There are a lot of proteins that have less than 5% total structural homology difference, so if you knock one out, it's highly likely that you're going to knock the other one out too,” explains Franklin. “These are the types of proteins we are looking at, including synthetic lethal cancer targets, where if you can selectively knock out one version and not the other, you can reduce toxicity and the tumor dies.”

To solve these selectivity issues, Franklin’s team employs an integrated computational and wet lab approach that relies upon the use of DNA-encoded library technology to test billions of compounds for binding to a particular drug target. “During DEL synthesis, small molecule compounds are made by combinatorial chemistry and encoded with unique DNA barcodes for each of the chemical synthesis steps,” explains Franklin. “This allows users to create unique labels for any possible set of chemistries and create billions of compounds in diverse libraries at a much lower cost than other screening technologies.”

All of this happens in just a single test tube. “One well or tube can be used to screen billions of molecules under one defined condition,” he remarks. “If you want to evaluate 25 conditions against your billions of molecules, you only need to set up 25 tubes.”

From an initial curated library of 2B possible drug compounds, Anagenex uses multiple parallel biochemical experiments to screen for promising candidates, with the results used to train proprietary machine learning models to understand what makes the best candidate compounds. These models then design new libraries of 1M or more compounds to be synthesized and tested. By iteratively running this loop of experimentation and machine learning, this approach generates increasingly robust predictions of drug-like compounds while optimizing candidate molecules to increase the likelihood that they will be successful in the clinic.

Combining DEL technology and machine learning

Joe Franklin, Senior Vice President, Anagenex
Joe Franklin, Senior Vice President at Anagenex, discusses the implementation of DEL technologies and coupling DEL with machine learning in the webinar: Combining DNA encoded libraries with machine learning to accelerate drug discovery

Whilst some drug discovery companies have been quick to apply machine learning tools to existing screening platforms, Anagenex’s platform is unique, in that it has been designed specifically to meet the dual needs of both technologies. “We designed a platform that focuses on what is important to machine learning and what is important to screening from the very beginning,” says Franklin. “We built them to work with each other – we haven't just bolted one on to the other.”

Generating large volumes of high-quality data is crucial to marrying these technologies. “We collect massive quantities of data at all steps of our DEL process, from library validation, library construction, target selections, and follow-up assays,” says Franklin, adding: “For example, every building block that goes into making one of our libraries gets validated against multiple substrates, and we use that information to better guide our machine learning. On the selection side, we look at many more conditions than other companies, conducting competition analyses, blocking active sites, and looking at allosteric sites and the difference between binding in an active site versus other sites on the protein. Our machine learning models have been purpose-built to incorporate and understand all this data to make highly accurate predictions against our targets.”

Sourcing high-quality components

DEL synthesis for drug discovery hinges on the reliable and timely procurement of large sets of chemical building blocks and customized oligonucleotides. Here, Franklin notes the benefits of working closely with trusted and quality-assured suppliers. “We have built relationships with key vendors that more resemble partnerships,” he says, adding: “On the building block side, we are collaborating with vendors to reduce the design, creation, testing, and analysis cycle, and have optimized parameters such as format and quantity to create faster turnaround times.”


We evaluated many different vendors but found that Biosearch Technologies provides us the best oligo quality for our process, and at the speed we need.

Joe Franklin  Senior Vice President, Anagenex


When looking for providers of critical DEL oligo components, LGC Biosearch Technologies’ 35 years of experience in nucleic acid synthesis and wide offering of high-quality oligos in both standard and custom formats made it an obvious choice for Anagenex. “On the DNA side, all of our oligos come from LGC Biosearch Technologies,” he says, adding: “We evaluated many different vendors but found that Biosearch Technologies provides us the best oligo quality for our process, and at the speed we need. They also support our technology development with rapid turnaround.”

The future of drug discovery

DNA-encoded libraries and computational technologies are gaining ground in drug discovery, and Franklin sees them becoming even more integral to research efforts in the future. “Historically, datasets have been too small and computational tools too primitive to really make an enormous difference across the pipeline,” he says. “Now, more and more companies are generating large, high-quality datasets, using machine learning to understand those datasets, and then applying models to design new experiments to drive deeper and faster insights into what medicines might be successful.”

With this trend set to continue, the effective integration of computational and experimental approaches will also become increasingly important. “Companies that really understand how to combine lab work and computers effectively will be the most successful drug hunters in the future, and ultimately, this will lead to more and better drugs for patients,” he concludes.

Want to know more on this topic?

Join a live Q&A with Joe Franklin as he discusses the implementation of DEL technologies and coupling DEL with machine learning: Combining DNA encoded libraries with machine learning to accelerate drug discovery

Links

Tags

Combinatorial ChemistryCombinatorial chemistry, also known as combichem, is a technique used in drug discovery to create libraries of structurally related compounds. A library is generated by synthesis with a chemical reactor system or by computer-based modeling of compound combinations. When undertaking combinatorial chemistry consider reagents, buffers, resins and standards.High-Throughput ScreeningHigh-throughput screening (HTS) is an automated drug discovery technique for identification of active compounds against a compound library. Use HTS readers and integrated assay preparation / analysis workstations to screen your compounds. Identify active compounds against various HTS libraries, including membranes, proteins and peptides and HTS cell lines. Find the best high-throughput screening products in our peer-reviewed product directory: compare products, check customer reviews and receive pricing direct from manufacturers.Compound LibrariesCompound libraries, or chemical libraries, are used in drug discovery for the identification of potential therapeutics compounds. Used in conjunction with high-throughput screening, the libraries of stored compounds are often generated for specific purposes as a drug target or disease model. Cheminformatics are commonly used when designing a compound library and software can be used to analyze the screening process.  High-Content ScreeningHigh-content screening (HCS), also known as high-content analysis (HCA), is a high-throughput technique used in drug discovery to identify substances that alter the phenotype of cells. HCS uses fluorescent microscopic imaging and automated image analysis to investigate cellular events such as apoptosis, cell viability, GPCR activation, oxide production, neurite outgrowth, and cell signaling. Find the best fluorescent labeling reagents, cellular assays, and high-content imaging systems in our peer-reviewed product directory: compare products, check customer reviews and receive pricing direct from manufacturers.DNA SequencingDNA sequencing, such as sanger sequencing, is a biological technique that determines the precise order of nucleotide bases in a fragment or template of DNA. DNA sequencers and genetic analyzers are based on capillary electrophoresis, where labeled DNA fragments are electrophoretically separated by size as they migrate through a polymer. Find the best DNA sequencing products, including DNA sequencing kits, genomic libraries and genetic identity kits in our peer-reviewed product directory: compare products, check customer reviews and receive pricing direct from manufacturers.Biopharmaceutical AdvancesBiopharmaceutical advances follow the development of pharmaceuticals derived from biotechnology, also known as biotechnology medicines. Biopharmaceuticals may be produced from cell lines, plants, or microbial cells. Important considerations of biopharmaceutical use include application, cost, production process and purification.Software PlatformsSoftware platforms are useful for various stages of laboratory experiments from data collection to data storage and processing. For instance lab software is available for system control, data management, data analysis and qualification / validation.Artificial Intelligence / Machine LearningArtificial intelligence (AI) and machine learning (ML) are transformative technologies used to analyze complex data, identify patterns, and make data-driven predictions across diverse scientific fields. Automate the analysis of large or complex data sets using AI algorithms and leverage machine learning models to improve diagnostics, accelerate drug discovery, and refine experimental design. Discover the best AI/ML software, platforms, and analytical tools in our peer-reviewed product directory: compare features, read customer reviews, and request pricing directly from manufacturers.Library GenerationLibrary generation refers to the construction of NGS libraries from RNA and DNA sources.Drug DiscoveryDrug discovery is the process of identifying potential new medications, involving stages such as target identification, compound screening, and preclinical development. It relies on cutting-edge technologies like high-throughput screening, artificial intelligence, and molecular modeling to accelerate the identification of drug candidates. Drug discovery plays a pivotal role in developing new therapies for diseases ranging from cancer to rare genetic disorders. Browse our peer-reviewed product directory to find the latest drug discovery technologies, compare options, check customer feedback, and get pricing directly from manufacturers.ScreeningUsing robotics, data processing and control software, liquid handling devices and sensitive detectors, screening allows a researcher to quickly conduct millions of chemical, genetic or pharmacological tests.Compound ScreeningCompound screening is a method used to discover specific compounds that could be promising candidates for pharmaceutical use. This potential is identified when compounds interact with the target protein during screening and could therefore be carried forward in the drug development process.DNA Barcoding