The Informed Company. How to Build Modern Agile Data Stacks that Drive Winning Insights. Edition No. 1


Book
256 Pages
October 2021
John Wiley and Sons Ltd
ID: 5836200

Learn how to manage a modern data stack and get the most out of data in your organization!

Thanks to the emergence of new technologies and the explosion of data in recent years, we need new practices for managing and getting value out of data. In the modern, data driven competitive landscape the "best guess" approach - reading blog posts here and there and patching together data practices without any real visibility - is no longer going to hack it. The Informed Company provides definitive direction on how best to leverage the modern data stack, including cloud computing, columnar storage, cloud ETL tools, and cloud BI tools. You'll learn how to work with Agile methods and set up processes that's right for your company to use your data as a key weapon for your success . . . You'll discover best practices for every stage, from querying production databases at a small startup all the way to setting up data marts for different business lines of an enterprise.

In their work at Chartio, authors Fowler and David have learned that most businesspeople are almost completely self-taught when it comes to data. If they are using resources, those resources are outdated, so they're missing out on the latest cloud technologies and advances in data analytics. This book will firm up your understanding of data and bring you into the present with knowledge around what works and what doesn't.

Discover the data stack strategies that are working for today's successful small, medium, and enterprise companies
Learn the different Agile stages of data organization, and the right one for your team
Learn how to maintain Data Lakes and Data Warehouses for effective, accessible data storage
Gain the knowledge you need to architect Data Warehouses and Data Marts
Understand your business's level of data sophistication and the steps you can take to get to "level up" your data

The Informed Company is the definitive data book for anyone who wants to work faster and more nimbly, armed with actionable decision-making data.

About This Book xiii

Foreword xxi

Introduction xxv

Stage 1 Source (aka Siloed Data) 1

Chapter 1 Starting with Source Data 3

Common Options for Analyzing Source Data 4

Chapter 2 The Need to Replicate Source Data 11

Replicate Sources 12

Create Read-Only Access 14

Chapter 3 Source Data Best Practices 15

Keep a Complexity Wiki Page 15

Snippet Dictionary 16

Use a BI Product 17

Double Check Results 18

Keep Short Dashboards 19

Design Before Building 20

Stage 2 Data Lake (aka Data Combined) 23

Chapter 4 Why Build a Data Lake? 25

What Is a Data Lake? 26

Reasons to Build a Data Lake Summarized 27

Chapter 5 Choosing an Engine for the Data Lake 33

Modern Columnar Warehouse Engines 35

Modern Warehouse Engine Products 38

Database Engines 41

Recommendation 42

Chapter 6 Extract and Load (EL) Data 45

ETL versus ELT 46

EL/ETL Vendors 48

Extract Options 49

Load Options 51

Multiple Schemas 52

Other Extract and Load Routes 53

Chapter 7 Data Lake Security 55

Access in Central Place 56

Permission Tiers 57

Chapter 8 Data Lake Maintenance 59

Why SQL? 60

Data Sources 61

Performance 64

Upgrade Snippets to Views 68

Stage 3 Data Warehouse (aka the Single Source of Truth) 69

Chapter 9 The Power of Layers and Views 75

Make Readable Views 77

Layer Views on Views 78

Start with a Single View 81

Chapter 10 Staging Schemas 83

Orient to the Schemas 84

Pick a Table and Clean It 85

Other Staging Modeling Considerations 98

Building on Top of Staging Schemas 106

Chapter 11 Model Data with dbt 111

Version Control 111

Modularity and Reusability 112

Package Management 112

Organizing Files 113

Macros 113

Incremental Tables 114

Testing 115

Chapter 12 Deploy Modeling Code 119

Branch Using Version Control Software 119

Commit Message 120

Test Locally 120

Code Review 121

Schedule Runs 122

Chapter 13 Implementing the Data Warehouse 123

Manage Dependencies 124

Combine Tables Within Schemas 126

Combine Tables Across Schemas 128

Keep the Grain Consistent 130

Create Business Metrics 131

Keeping Accurate History 133

Chapter 14 Managing Data Access 135

How to Secure Sensitive Data in the Data Warehouse 137

How to Secure Sensitive Data in a BI Tool 140

Chapter 15 Maintaining the Source of Truth 143

Track New Metrics 144

Deprecate Old Metrics 147

Deprecate Old Schemas 149

Resolve Conflicting Numbers 150

Handling Ongoing Requests and Ongoing Feedback 151

Updating Modeling Code 152

Manage Access 153

Tuning to Optimize 156

Code Review All Modeling 157

Maintenance Checklist 158

Stage 4 Data Marts (aka Data Democratized) 161

Chapter 16 Data Mart Implementation 167

Views on the Data Warehouse 167

Segment Tables 168

Access Update 169

Chapter 17 Data Mart Maintenance 171

Educate Team 172

Identifies Issues 172

Identify New Needs 176

Help Track Success 176

Chapter 18 Modern versus Traditional Data Stacks: What’s Changed? 177

What’s Changed? 177

Chapter 19 Row-versus

Column-Oriented

Database 181

Row-Oriented

Databases 182

Column-Oriented

Databases 184

Summary 190

Chapter 20 Style Guide Example 191

Simplify 192

Clean 194

Naming Conventions 195

Share It 197

Chapter 21 Building an SST Example 199

First Attempt - Same Tables with Prefixes 199

Second Attempt - Operational Schema (Source Agnostic) 205

Third Attempt - Application Separate, Other Sources Smashed 207

Less Planning, More Implementing 209

Acknowledgments and Contributions 211

Index 213

Authors

Dave Fowler Matthew C. David

Table of Contents

Authors

Related Topics

Related Products

Data Pipeline Tools Market by Component, Data Pipeline Type, Deployment, Organization Size, Application, End-Use - Global Forecast 2025-2030

Supply Chain Management. A Global Perspective. Edition No. 3

Data Pipeline Tools Market - Global Industry Size, Share, Trends, Opportunity, and Forecast, 2019-2029F

Risk Analytics Market by Offering (Software (ETL Tools, Risk Calculation Engines, GRC Software), Services), Risk Type (Strategic Risk, Operational Risk, Financial Risk, Regulatory Risk), Risk Stages, Vertical and Region - Forecast to 2029

Knowledge Graph Market by Solution (Enterprise Knowledge Graph Platform, Graph Database Engine, Knowledge Management Toolset), Model Type (Resource Description Framework (RDF) Triple Stores, Labeled Property Graph) - Global Forecast to 2030