Open Access Databases and Datasets for Drug Discovery. Edition No. 1. Methods & Principles in Medicinal Chemistry


Book
352 Pages
November 2023
John Wiley and Sons Ltd
ID: 5835953

Open Access Databases and Datasets for Drug Discovery

Timely resource discussing the future of data-driven drug discovery and the growing number of open-source databases

With an overview of 90 freely accessible databases and datasets on all aspects of drug design, development, and discovery, Open Access Databases and Datasets for Drug Discovery is a comprehensive guide to the vast amount of “free data” available to today’s pharmaceutical researchers. The applicability of open-source data for drug discovery and development is analyzed, and their usefulness in comparison with commercially available tools is evaluated.

The most relevant databases for small molecules, drugs and druglike substances, ligand design, protein 3D structures (both experimental and calculated), and human drug targets are described in depth, including practical examples of how to access and work with the data. The first part is focused on databases for small molecules, followed by databases for macromolecular targets and diseases. The final part shows how to integrate various open-source tools into the academic and industrial drug discovery and development process.

Contributed to and edited by experts with long-time experience in the field, Open Access Databases and Datasets for Drug Discovery includes information on:

An extensive listing of open access databases and datasets for computer-aided drug design
PubChem as a chemical database for drug discovery, DrugBank Online, and bioisosteric replacement for drug discovery supported by the SwissBioisostere database
The Protein Data Bank (PDB) and macromolecular structure data supporting computer-aided drug design, and the SWISS-MODEL repository of 3D protein structures and models
PDB-REDO in computational aided drug design (CADD), and using Pharos/TCRD for discovering druggable targets

Unmatched in scope and thoroughly reviewing small and large open data sources relevant for rational drug design, Open Access Databases and Datasets for Drug Discovery is an essential reference for medicinal and pharmaceutical chemists, and any scientists involved in the drug discovery and drug development.

Series Editors Preface xiii

Raimund Mannhold - A Personal Obituary from the Series Editors xvii

A Personal Foreword xxi

1 Open Access Databases and Datasets for Computer-Aided Drug Design. A Short List Used in the Molecular Modelling Group of the SIB 1
Antoine Daina, María José Ojeda-Montes, Maiia E. Bragina, Alessandro Cuozzo, Ute F. Röhrig, Marta A.S. Perez, and Vincent Zoete

References 30

Part I Small Molecules 39

2 PubChem: A Large-Scale Public Chemical Database for Drug Discovery 41
Sunghwan Kim and Evan E. Bolton

2.1 Introduction 41

2.2 Data Content and Organization 42

2.3 Tools and Services 45

2.3.1 PubChem Search 45

2.3.2 Summary Pages 48

2.3.3 Literature Knowledge Panel 49

2.3.4 2D and 3D Neighbors 50

2.3.5 Classification Browser 51

2.3.6 Identifier Exchange Service 52

2.3.7 Programmatic Access 52

2.3.8 PubChem FTP Site and PubChemRDF 53

2.4 Drug- and Lead-Likeness of PubChem Compounds 54

2.5 Bioactivity Data in PubChem 56

2.6 Comparison with Other Databases 57

2.7 Use of PubChem Data for Drug Discovery 58

2.8 Summary 59

Acknowledgments 60

References 60

3 DrugBank Online: A How-to Guide 67
Christen M. Klinger, Jordan Cox, Denise So, Teira Stauth, Michael Wilson, Alex Wilson, and Craig Knox

3.1 Introduction 67

3.2 DrugBank 68

3.2.1 Overview of DrugBank 68

3.2.2 DrugBank Datasets 69

3.2.2.1 Drug Cards: An Overview and Navigation Guide 70

3.2.2.2 Identification 70

3.2.2.3 Pharmacology 71

3.2.2.4 Categories 73

3.2.2.5 Properties 73

3.2.2.6 Targets, Enzymes, Carriers, and Transporters 73

3.2.2.7 References 77

3.3 Protocols 77

3.3.1 General Workflows 77

3.3.1.1 Using DrugBank Online’s Search Functionality 77

3.3.1.2 Using DrugBank Online’s Advanced Search Functionality 80

3.3.1.3 Browsing Drugs Using DrugBank Online’s Drug Categories 83

3.3.2 Identifying Chemicals and Relevant Sequences 86

3.3.2.1 Searching Using Chemical Structure Search 86

3.3.2.2 Using Sequence Search to Find Similar Targets 89

3.3.3 Extracting DrugBank Datasets for ml 93

3.4 Research Using DrugBank 94

3.5 Discussion and Conclusions 95

References 96

4 Bioisosteric Replacement for Drug Discovery Supported by the SwissBioisostere Database 101
Antoine Daina, Alessandro Cuozzo, Marta A.S. Perez, and Vincent Zoete

4.1 Introduction 101

4.1.1 Concept of Isosterism and Bioisosterism 101

4.1.2 Classical vs. Non-classical Bioisostere and Further Molecular Replacements 102

4.1.3 Bioisosteric Replacement in Drug Discovery 105

4.2 Construction and Dissemination of SwissBioisostere 106

4.2.1 Intention and Requirements 106

4.2.2 Bioactivity Data 107

4.2.3 Nonsupervised Matched Molecular Pair Analysis 108

4.2.4 Database 108

4.2.5 Web Interface 109

4.3 Content of SwissBioisostere 111

4.3.1 Global Content 111

4.3.2 Biological and Chemical Contexts 112

4.3.3 Fragment Shape Diversity 113

4.4 Usage of SwissBioisostere 115

4.4.1 Website Usage 115

4.4.2 Most Frequent Requests 117

4.4.3 Examples Related to Drug Discovery 117

4.4.3.1 Use Cases 117

4.4.3.2 Replacing Unwanted Chemical Groups 118

4.4.3.3 Optimization of Passive Absorption and Blood-Brain Barrier Diffusion 122

4.4.3.4 Reduction of Flexibility 124

4.4.3.5 Reduction of Aromaticity/Escape from Flatland 128

4.5 Conclusive Remarks 133

Acknowledgment 133

References 133

Part II Macromolecular Targets and Diseases 139

5 The Protein Data Bank (PDB) and Macromolecular Structure Data Supporting Computer-Aided Drug Design 141
David Armstrong, John Berrisford, Preeti Choudhary, Lukas Pravda, James Tolchard, Mihaly Varadi, and Sameer Velankar

5.1 Introduction 141

5.2 Small Molecule Data in Protein Data Bank (PDB) Entries 142

5.2.1 What Data are in the PDB Archive? 142

5.2.2 Definition of Small Molecules in OneDep 145

5.3 Small Molecule Dictionaries 146

5.3.1 wwPDB Chemical Component Dictionary (CCD) 146

5.3.2 The Peptide Reference Dictionary 147

5.4 Additional Ligand Annotations in the PDB Archive 148

5.4.1 Linkage Information 148

5.4.2 Carbohydrates 149

5.5 Validation of Ligands in the Worldwide Protein Data Bank (wwPDB) 150

5.5.1 Various Criteria and Software Used for Validating Ligand in Validation Reports 150

5.5.2 Identification of Ligand of Interest (LOI) 151

5.5.3 Geometric and Conformational Validation 152

5.5.4 Ligand Fit to Experimental Electron Density Validation 152

5.5.5 Accessing wwPDB Validation Reports from PDBe Entry Pages 154

5.5.6 Other Planned Improvements to Enhance Ligand Validation 154

5.6 PDBe Tools for Ligand Analysis 155

5.6.1 Ligand Interactions 155

5.6.1.1 Classifying Ligand Interactions 155

5.6.1.2 Data Availability 156

5.6.2 Ligand Environment Component 156

5.6.3 Chemistry Process and FTP 158

5.6.4 PDBeChem Pages 158

5.7 Ligand-Related Annotations in the PDBe-KB 158

5.7.1 Introduction to PDBe-KB 158

5.7.2 Data Access Mechanisms for Ligand-Related Annotations 160

5.7.3 Ligand-Related Annotations on the Aggregated Views of Proteins 162

5.8 Case Study: Using PDB Data to Support Drug Discovery 164

5.9 Conclusions and Outlook 165

5.9.1 Upcoming Features and Improvements 166

References 167

6 The SWISS-MODEL Repository of 3D Protein Structures and Models 175
Xavier Robin, Andrew Mark Waterhouse, Stefan Bienert, Gabriel Studer, Leila T. Alexander, Gerardo Tauriello, Torsten Schwede, and Joana Pereira

6.1 Introduction 175

6.2 SMR Database Content and Model Providers 176

6.2.1 PDB 177

6.2.2 Swiss-model 177

6.2.3 AlphaFold Database 179

6.2.4 ModelArchive 180

6.3 Protein Feature Annotation and Cross-References to Computational Resources 181

6.3.1 Structural Features, Ligands, and Oligomers 181

6.3.2 SWISS-MODEL associated tools 182

6.3.3 Web and API Access 183

6.4 Quality Estimates and Benchmarking 188

6.5 Binding Site Conformational States 189

6.6 SMR and Computer-Aided Structure-based Drug Design 190

6.7 Conclusion and Outlook 191

References 193

7 PDB-REDO in Computational-Aided Drug Design (CADD) 201
Ida de Vries, Anastassis Perrakis, and Robbie P. Joosten

7.1 History and Concepts 201

7.1.1 X-ray Structure Models 201

7.1.2 PDB-REDO Development 202

7.1.2.1 First Uniformity 203

7.1.2.2 Automatic Rebuilding of Protein Backbone and Side Chains 203

7.1.2.3 Automated Model Completion Approaches 204

7.1.2.4 Systematic Integration of Structural Knowledge 205

7.1.2.5 Overview of PDB-REDO Pipeline 205

7.2 Structure Improvements by PDB-REDO 206

7.2.1 Parametrization and Rebuilding Effects on Small Molecule Ligands 206

7.2.1.1 Re-refinement Improves Ligand Conformation 206

7.2.1.2 Side Chain Rebuilding Improves Ligand Binding Sites 207

7.2.1.3 Histidine Flip and Improved Ligand Parameterization 208

7.2.2 Building of Protein Loops and Ligands into Protein Structure Models 210

7.2.2.1 Loop Building Completes a Binding Site Region 210

7.2.2.2 Loop Building Results in Improved Binding Sites 211

7.2.2.3 Building new Compounds into Density 212

7.2.3 Nucleic Acid Improvements by PDB-REDO 213

7.2.4 Glycoprotein Structure Model Rebuilding 214

7.2.5 Metal Binding Sites 214

7.2.6 Limitations of the PDB-REDO Databank 216

7.3 Access the PDB-REDO Databank and Metadata 218

7.3.1 Downloading and Inspecting Individual PDB-REDO Entries 218

7.3.2 Data Available in PDB-REDO Entries 220

7.3.3 Usage of the Uniform and FAIR Validation Data 220

7.3.4 Creating Datasets from the PDB-REDO Databank 222

7.3.5 Submitting Structure Models to the PDB-REDO Pipeline 223

7.4 Conclusions 223

Acknowledgments and Funding 224

List of Abbreviations and Symbols 224

References 225

8 Pharos and TCRD: Informatics Tools for Illuminating Dark Targets 231
Keith J. Kelleher, Timothy K. Sheils, Stephen L. Mathias, Dac-Trung Nguyen, Vishal Siramshetty, Ajay Pillai, Jeremy J. Yang, Cristian G. Bologa, Jeremy S. Edwards, Tudor I. Oprea, and Ewy Mathé

8.1 Introduction 231

8.2 Methods 233

8.2.1 Data Organization 233

8.2.1.1 Target Alignment 234

8.2.1.2 Disease Alignment 234

8.2.1.3 Ligand Alignment 234

8.2.1.4 Data and UI Updates 235

8.2.2 Programmatic Access and Data Download 235

8.2.3 UI Organization 235

8.2.3.1 List Pages 236

8.2.3.2 Details Pages 236

8.2.3.3 Search 238

8.2.3.4 Tutorials 240

8.2.4 Analysis Methods Within Pharos 240

8.2.4.1 Searching for Ligands 240

8.2.4.2 Finding Targets by Amino Acid Sequence 241

8.2.4.3 Finding Targets with Similar Annotations 241

8.2.4.4 Finding Targets with Predicted Activity 241

8.2.4.5 Enrichment Scores for Filter Values 241

8.3 Use Cases 242

8.3.1 Hypothesizing the Role of a Dark Target 242

8.3.1.1 Primary Documentation 242

8.3.1.2 List Analysis 247

8.3.1.3 Downloading Data 251

8.3.1.4 Variations on this Use Case 251

8.3.2 Characterizing a Novel Chemical Compound 251

8.3.2.1 Finding Predicted Targets 252

8.3.2.2 Analyzing Similar Ligands 254

8.3.2.3 Ligand Details Pages 256

8.3.2.4 Variations on this Use Case 257

8.3.3 Investigating Diseases 260

8.4 Discussion 262

Funding 264

References 264

Part III Users’ Points of View 269

9 Mining for Bioactive Molecules in Open Databases 271
Guillem Macip, Júlia Mestres-Truyol, Pol Garcia-Segura, Bryan Saldivar-Espinoza, Santiago Garcia-Vallvé, and Gerard Pujadas

9.1 Introduction 271

9.2 Main Tools for Virtual Screening 272

9.2.1 ADMET and PAINS Filtering 272

9.2.2 Protein-Ligand Docking 274

9.2.3 Pharmacophore Search 275

9.2.4 Shape/Electrostatic Similarity 276

9.2.5 Protein-Structure Databases 277

9.2.6 The Protein Data Bank 278

9.2.7 The PDB-REDO Databank 278

9.2.8 The SWISS-MODEL Repository 279

9.2.9 The AlphaFold Protein Structure Database 279

9.3 Validating Binding Site and Ligand Coordinates in Three-Dimensional Protein Complexes 280

9.4 Databases for Searching New Drugs 281

9.4.1 Coconut 281

9.4.2 GDBs 282

9.4.3 Zinc 20 282

9.5 Databases of Bioactive Molecules 282

9.5.1 The BindingDB Database 283

9.5.2 PubChem 283

9.5.3 ChEMBL 284

9.6 Databases of Inactive/Decoy Molecules 285

9.6.1 Collecting Experimentally Inactive Compounds from PubChem 285

9.6.2 Collecting Presumed Inactive Compounds from Decoy Databases 285

9.6.3 Building Custom-Based Decoy Sets 286

9.7 Main Metrics for Evaluating the Success of a Virtual Screening 286

9.8 Concluding Remarks 288

References 289

10 Open Access Databases - An Industrial View 299
Michael Przewosny

10.1 Academic vs. Industrial Research 299

10.2 Scaffold-Hopping 310

10.3 Virtual-Screening 311

Abbreviations 312

References 313

Index 317

Authors

Antoine Daina SIB Swiss Institute of Bioinformatics. Michael Przewosny RWTH Aachen, Germany. Vincent Zoete SIB Swiss Institute of Bioinformatics; University of Lausanne.

Table of Contents

Authors

Related Topics

Related Products

Open Access Databases and Datasets for Drug Discovery. Edition No. 1. Methods & Principles in Medicinal Chemistry

Applied Chemoinformatics. Achievements and Future Opportunities. Edition No. 1

Advances in the Medicinal Chemistry of Neglected Tropical Disease and Related Infectious Diseases

Computer-Aided Drug Discovery Methods: A Brief Introduction

Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development