Perform genome analysis and sequencing of data with Amazon Web Services
Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services enables a person who has moderate familiarity with AWS Cloud to perform full genome analysis and research. Using the information in this book, you'll be able to take a FASTQ file containing raw data from a lab or a BAM file from a service provider and perform genome analysis on it. You'll also be able to identify potentially pathogenic gene sequences.
- Get an introduction to Whole Genome Sequencing (WGS)
- Make sense of WGS on AWS
- Master AWS services for genome analysis
Some key advantages of using AWS for genomic analysis is to help researchers utilize a wide choice of compute services that can process diverse datasets in analysis pipelines. Genomic sequencers that generate raw data files are located in labs on premises and AWS provides solutions to make it easy for customers to transfer these files to AWS reliably and securely. Storing Genomics and Medical (e.g., imaging) data at different stages requires enormous storage in a cost-effective manner. Amazon Simple Storage Service (Amazon S3), Amazon Glacier, and Amazon Elastics Block Store (Amazon EBS) provide the necessary solutions to securely store, manage, and scale genomic file storage. Moreover, the storage services can interface with various compute services from AWS to process these files.
Whether you're just getting started or have already been analyzing genomics data using the AWS Cloud, this book provides you with the information you need in order to use AWS services and features in the ways that will make the most sense for your genomic research.
Table of Contents
Introduction xix
Chapter 1 Why Do Genome Analysis Yourself When Commercial Offerings Exist? 1
Chapter 2 A Crash Course in Molecular Biology 9
Chapter 3 Obtaining Your Genome 25
Chapter 4 The Bioinformatics Workflow 39
Chapter 5 AWS Services for Genome Analysis 59
Chapter 6 Building Your Environment in the AWS Cloud 77
Chapter 7 Linux and AWS Command-Line Basics for Genomics 115
Chapter 8 Processing the Sequencing Data 143
Chapter 9 Visualizing the Genome 211
Chapter 10 Containerizing Your Workflow on the Desktop 235
Chapter 11 Variants and Applications 249
Chapter 12 Cancer Genomics 267
Index 291