The essential guide for data scientists and for leaders who must get more from their data science teams
The Economist boldly claims that data are now "the world's most valuable resource." But, as Kenett and Redman so richly describe, unlocking that value requires far more than technical excellence. The Real Work of Data Science explores understanding the problems, dealing with quality issues, building trust with decision makers, putting data science teams in the right organizational spots, and helping companies become data-driven. This is the work that spells the difference between a good data scientist and a great one, between a team that makes marginal contributions and one that drives the business, between a company that gains some value from its data and one in which data truly is "the most valuable resource."
"These two authors are world-class experts on analytics, data management, and data quality; they've forgotten more about these topics than most of us will ever know. Their book is pragmatic, understandable, and focused on what really counts. If you want to do data science in any capacity, you need to read it."
- Thomas H. Davenport, Distinguished Professor, Babson College and Fellow, MIT Initiative on the Digital Economy
"I like your book. The chapters address problems that have faced statisticians for generations, updated to reflect today's issues, such as computational Big Data."
- Sir David Cox, Warden of Nuffield College and Professor of Statistics, Oxford University
"Data science is critical for competitiveness, for good government, for correct decisions. But what is data science? Kenett and Redman give, by far, the best introduction to the subject I have seen anywhere. They address the critical questions of formulating the right problem, collecting the right data, doing the right analyses, making the right decisions, and measuring the actual impact of the decisions. This book should become required reading in statistics and computer science departments, business schools, analytics institutes and, most importantly, by all business managers."
- A. Blanton Godfrey, Joseph D. Moore Distinguished University Professor, Wilson College of Textiles, North Carolina State University
Table of Contents
About the Authors xv
Preface xvii
About the Companion Website xxi
1 A Higher Calling 1
The Life‐Cycle View 2
Problem Elicitation: Understand the Problem 3
Goal Formulation: Clarify the Short‐term and Long‐term Goals 3
Data Collection: Identify Relevant Data Sources and Collect the Data 3
Data Analysis: Use Descriptive, Explanatory, and Predictive Methods 3
Formulation of Findings: State Results and Recommendations 4
Operationalization of Findings: Suggest Who, What, When, and How 5
Communication of Findings: Communicate Findings, Decisions, and Their Implications to Stakeholders 5
Impact Assessment: Plan and Deploy an Assessment Strategy 5
The Organizational Ecosystem 6
Organizational Structure 6
Organizational Maturity 6
Once Again, Our Goal 6
2 The Difference Between a Good Data Scientist and a Great One 9
Implications 11
3 Learn the Business 13
The Annual Report 13
SWOTs and Strategic Analysis 13
The Balanced Scorecard and Key Performance Indicators 14
The Data Lens 15
Build Your Network 16
Implications 16
4 Understand the Real Problem 17
A Telling Example 17
Understanding the Real Problem 18
Implications 19
5 Get Out There 21
Understand Context and Soft Data 21
Identify Sources of Variability 22
Selective Attention 23
Memory Bias 23
Implications 23
6 Sorry, but You Can’t Trust the Data 25
Most Data Is Untrustworthy 25
Dealing with Immediate Issues 27
Getting in Front of Tomorrow’s Data Quality Issues 29
Implications 30
7 Make It Easy for People to Understand Your Insights 31
First, Get the Basics Right 31
Presentations Get Passed Around 33
The Best of the Best 34
Implications 34
8 When the Data Leaves Off and Your Intuition Takes Over 35
Modes of Generalization 36
Implications 38
9 Take Accountability for Results 39
Practical Statistical Efficiency 39
Using Data Science to Perform Impact Analysis 41
Implications 42
10 What It Means to Be “Data‐driven” 43
Data‐driven Companies and People 43
Traits of the Data‐driven 44
Traits of the Antis 46
Implications 46
11 Root Out Bias in Decision‐making 49
Understand Why It Occurs 50
Take Control on a Personal Level 50
Solid Scientific Footings 51
Problem 1 52
Problem 2 52
Implications 53
12 Teach, Teach, Teach 55
The Rope Exercise 55
The “Roll Your Own” Exercise 56
The Starter Kit of Questions to Ask Data Scientists 59
Implications 60
13 Evaluating Data Science Outputs More Formally 63
Assessing Information Quality 63
A Hands‐On Information Quality Workshop 64
Phase I: Individual Work 64
Phase II: Teamwork 65
Phase III: Group Presentation 66
Implications 66
14 Educating Senior Leaders 67
Covering the Waterfront 68
Companies Need a Data and Data Science Strategy 70
Organizations Are “Unfit for Data” 71
Get Started with Data Quality 71
Implications 71
15 Putting Data Science, and Data Scientists, in the Right Spots 73
The Need for Senior Leadership 73
Building a Network of Data Scientists 74
Implications 76
16 Moving Up the Analytics Maturity Ladder 77
Implications 81
17 The Industrial Revolutions and Data Science 83
The First Industrial Revolution: From Craft to Repetitive Activity 84
The Second Industrial Revolution: The Advent of the Factory 84
The Third Industrial Revolution: Enter the Computer 84
The Fourth Industrial Revolution: The Industry 4.0 Transformation 85
Implications 85
18 Epilogue 87
Strong Foundations 87
A Bridge to the Future 88
Appendix A: Skills of a Data Scientist 91
Appendix B: Data Defined 93
Appendix C: Questions to Help Evaluate the Outputs of Data Science 95
Appendix D: Ethical Considerations and Today’s Data Scientist 97
Appendix E: Recent Technical Advances in Data Science 99
References 101
A List of Useful Links 107
Index 111