Deep learning tool boosts drug discovery

Learn how CDD’s AI search tool offers a rapid and secure means to discover and optimize promising drug candidates

28 Jun 2024
Carrie Haslam
Associate Editor

Editorial article

Peter Gedeck

Dr. Peter Gedeck, Senior Data Scientist, Collaborative Drug Discovery

During early drug discovery, the identification of compounds with favorable bioactive properties is crucial, but managing the vast molecular data involved — from hit discovery to lead optimization — poses a daunting task.

Over the past twenty years, software as a service (SaaS) provider, Collaborative Drug Discovery (CDD), has been at the forefront of addressing this challenge. The company’s flagship software solution, CDD Vault®, provides biotech researchers with a secure platform to manage and analyze chemistry and biology data, facilitating chemical registration, visualization of structure-activity relationships (SAR), and real-time data sharing.

Now, the company has integrated a new deep learning algorithm into CDD Vault capable of performing massive database searches at speed. The tool also has the potential to generate ideas that could help design new promising drug molecules and optimize the performance of existing drug candidates. We spoke with Senior Data Scientist, Dr. Peter Gedeck, to learn more.

Sifting through millions of molecules in seconds

Based on the premise that structurally similar molecules tend to share similar properties, similarity searching has emerged as a pivotal tool in drug discovery. This approach offers a myriad of benefits, spanning from analog identification to the exploration of structure-activity relationships, analysis of the patent landscape, and the prioritization of compounds for synthesis or testing.

How similarity between molecules is calculated has been a subject of extensive research over many years. Among several approaches, two-dimensional similarity methods such as fingerprinting or feature vector counts have become popular due to their simplicity, accuracy, and efficiency.

The research informatics group at CDD has now developed a new approach — underpinned by a novel deep learning model — designed to help medicinal chemists identify similar structures faster, and in a safe and secure environment.

CDD

The model works by bridging the gap between two approaches to molecular representation, first converting a chemical structure into a numerical representation and then coupling this with a generative model able to produce simplified molecular-input line-entry system (SMILES) strings. “By coupling these two types of networks in one architecture, we can take the numerical representation resulting from a graph convolution vector and generate structures around it,” explains Gedeck.

This novel architecture supports several valuable applications, with similarity searching the first to become available in CDD Vault.

“Using the deep learning (DL) tool, we can create a numerical representation of a molecule, store it in a vector database, and then perform a similarity search against that,” adds Gedeck. “This is complementary to the more traditional similarity search methods, such as fingerprints and feature vector counts, but the search itself can be very fast – about a sub-second for several million structures.”

The DL tool has initially been integrated into CDD Vault for use with the ChEMBL dataset, allowing users to perform a similarity search against over 2.5 million synthesizable compounds, most of which have been tested against one or multiple drug targets. This can lead to insights about potential targets or side effect issues of novel screening hits.

According to Gedeck, one key advantage of having the DL tool integrated into CDD Vault is being able to securely conduct searches in public databases with proprietary structures. “All the searches are done in a trusted environment, so you can search in ChEMBL without exposing your structure information on a public server – which is a big win,” he says. “We are also considering incorporating vendor databases so that users have a way of expanding their structure-activity relationship (SAR) knowledge by ordering additional compounds.”

Beyond ChEMBL, the team is also in the process of expanding the integration to SureChEMBL, which will give users access to structures that are already patented, as well as links to the associated patents. “This is important because if researchers come across a hit in a high-throughput screening, they need to know whether this hit is covered by any other intellectual property (IP) or not,” Gedeck adds.

Using AI to tweak drug structures

In addition to extending the similarity search to a patent database and potentially vendor structures, CDD is also exploring the application of its deep learning algorithm to propose bioisosteric suggestions. Bioisoteres are molecules where one atom or group of atoms is substituted by another with similar properties, in a way that either conserves or enhances its biological activity.

“When we validated our system, we found that consistent structural changes – for example, changing a methyl group to a phenyl group – lead to similar changes in the latent vector space,” explains Gedeck. “We validated our latent vector representation also using a bioisostere database, and it works well, so we’re working to integrate it into CDD Vault as an idea generator.”

This will allow researchers to create several ideas around a structure by swapping out fragments with those that have a similar numerical representation.

“The advantage is it can create structures that are reasonable and not completely different from what you already have, whereas often the problem with generative models is they create something very different and that would be hard to synthesize,” Gedeck affirms.

Deep dive into SAR space to improve drug candidates

A final application of CDD's deep learning architecture, one that is still under development, is aimed at supporting the creation of quantitative structure-activity analysis (QSAR) models. Here, the numerical representations can be used to build DL models to predict the activity or chemical properties of molecules, helping to better understand how drug structure relates to drug-target interactions.

CDD

“We can also do something called inverse QSAR, where you have a numerical representation that, according to the QSAR model, should have better activity and you want to find the structure that represents it,” Gedeck enthuses. “By coupling the numerical representation with the generative model, we can create representation structures that then can be assessed.”

A two-dimensional representation of the SAR dataset can also be generated, with the results of the predictive model displayed with a heatmap. “The heatmap highlights unexplored areas, and by clicking on any spot, we can map this back to a latent vector and then generate a new structure for it,” he explains.

A new era of drug discovery

Artificial intelligence has the potential to revolutionize the process of drug discovery, signaling a new era in pharmaceutical research. Through CDD’s current and upcoming deep learning integrations within CDD Vault, biotech researchers can access the computational power to analyze vast datasets at an unprecedented speed, accurately predict molecular properties, and discover potential drug candidates using new and more cost-efficient methods.

CDD Vault

Collaborative Drug Discovery, Inc.

CDD Vault is a hosted informatics software that enables researchers to intuitively organize and analyze both biological and chemical data, and to collaborate with partners anywhere in the world through a secure web interface.

(0)

Links

Tags

Combinatorial ChemistryCombinatorial chemistry, also known as combichem, is a technique used in drug discovery to create libraries of structurally related compounds. A library is generated by synthesis with a chemical reactor system or by computer-based modeling of compound combinations. When undertaking combinatorial chemistry consider reagents, buffers, resins and standards.Data AnalysisData analysis hardware and software is available to make data processing straight-forward yet powerful. Data software can be used for math and stats, technical graphing and image analysis. In addition, software is available for specific data analysis of electrophoresis, densitometry, ELISA and DNA sequencing.Sample ManagementSample management systems include sample storage devices such as freezers and plate storers, sample environment enclosures and sample organization, retrieval and sorter systems. Useful system features include high-throughput, automation, robotic arms, automated liquid handling and associated database systems. Accessories in sample management include barcode scanners, heat sealers and tubes.Software PlatformsSoftware platforms are useful for various stages of laboratory experiments from data collection to data storage and processing. For instance lab software is available for system control, data management, data analysis and qualification / validation.Development SoftwareComputational techniques used in both the chemistry and biology aspects of drug development, for data acquisition, data analysis, processing and storage. Software is used for analysis of ADME results, toxicology, clinical trials and regulatory processes. Artificial Intelligence / Machine LearningArtificial intelligence (AI) and machine learning (ML) are transformative technologies used to analyze complex data, identify patterns, and make data-driven predictions across diverse scientific fields. Automate the analysis of large or complex data sets using AI algorithms and leverage machine learning models to improve diagnostics, accelerate drug discovery, and refine experimental design. Discover the best AI/ML software, platforms, and analytical tools in our peer-reviewed product directory: compare features, read customer reviews, and request pricing directly from manufacturers.Molecular BiologyMolecular biology is the branch of biology that focuses on the molecular mechanisms that underlie cellular functions. It involves studying DNA, RNA, and proteins to understand gene expression, replication, and regulation. Molecular biology is fundamental to biotechnology, medicine, and genetic research. Explore molecular biology products in our peer-reviewed product directory; compare products, check reviews, and get pricing directly from manufacturers.Drug TestingIt is essential to test the efficacy and toxicity of drugs at every stage of their development to ensure that a pharmaceutical product is fit for purpose and safe for use before reaching the stage of mass production and distribution.Data AnalysisThe analysis of data is the process of transforming, modeling and evaluating data to discover useful information from experimental results. Drug DiscoveryDrug discovery is the process of identifying potential new medications, involving stages such as target identification, compound screening, and preclinical development. It relies on cutting-edge technologies like high-throughput screening, artificial intelligence, and molecular modeling to accelerate the identification of drug candidates. Drug discovery plays a pivotal role in developing new therapies for diseases ranging from cancer to rare genetic disorders. Browse our peer-reviewed product directory to find the latest drug discovery technologies, compare options, check customer feedback, and get pricing directly from manufacturers.PharmaceuticalsPharmaceuticals are medicinal drugs used in healthcare to diagnose, prevent, cure and treat illnesses. Pharmaceuticals that are excreted after use appear in wastewater and can have detrimental effects on the environment.Drug Discovery & Development Screening