An essential guide to two burgeoning topics in machine learning - classification trees and ensemble learning
Ensemble Classification Methods with Applications in R introduces the concepts and principles of ensemble classifiers methods and includes a review of the most commonly used techniques. This important resource shows how ensemble classification has become an extension of the individual classifiers. The text puts the emphasis on two areas of machine learning: classification trees and ensemble learning. The authors explore ensemble classification methods’ basic characteristics and explain the types of problems that can emerge in its application.
Written by a team of noted experts in the field, the text is divided into two main sections. The first section outlines the theoretical underpinnings of the topic and the second section is designed to include examples of practical applications. The book contains a wealth of illustrative cases of business failure prediction, zoology, ecology and others. This vital guide:
- Offers an important text that has been tested both in the classroom and at tutorials at conferences
- Contains authoritative information written by leading experts in the field
- Presents a comprehensive text that can be applied to courses in machine learning, data mining and artificial intelligence
- Combines in one volume two of the most intriguing topics in machine learning: ensemble learning and classification trees
Written for researchers from many fields such as biostatistics, economics, environment, zoology, as well as students of data mining and machine learning, Ensemble Classification Methods with Applications in R puts the focus on two topics in machine learning: classification trees and ensemble learning.
Table of Contents
List of Contributors ix
List of Tables xi
List of Figures xv
Preface xvii
1 Introduction 1
Esteban Alfaro, Matías Gámez, and Noelia García
1.1 Introduction 1
1.2 Definition 1
1.3 Taxonomy of Supervised Classification Methods 2
1.4 Estimation of the Accuracy of a Classification System 3
1.4.1 The Apparent Error Rate 4
1.4.2 Estimation of the True Error Rate 4
1.4.3 Error Rate Estimation Methods 4
1.4.4 The Standard Error 6
1.5 Classification Trees 7
1.5.1 Classification Tree Building 8
1.5.2 Splitting Rule 9
1.5.3 Splitting Criteria 10
1.5.4 Goodness of a Split 10
1.5.5 The Impurity of a Tree 11
1.5.6 Stopping Criteria 11
1.5.7 Overfitting in Classification Trees 12
1.5.8 Pruning Rules 14
2 Limitation of the Individual Classifiers 19
Esteban Alfaro, Matías Gámez, and Noelia García
2.1 Introduction 19
2.2 Error Decomposition: Bias and Variance 20
2.3 Study of Classifier Instability 23
2.4 Advantages of Ensemble Classifiers 26
2.5 Bayesian Perspective of Ensemble Classifiers 28
3 Ensemble Classifiers Methods 31
Esteban Alfaro, Matías Gámez, and Noelia García
3.1 Introduction 31
3.2 Taxonomy of Ensemble Methods 32
3.2.1 Non-Generative Methods 33
3.2.2 Generative Methods 33
3.3 Bagging 34
3.4 Boosting 36
3.4.1 AdaBoost Training Error 40
3.4.2 AdaBoost and the MarginTheory 41
3.4.3 Other Boosting Versions 43
3.4.4 Comparing Bagging and Boosting 46
3.5 Random Forests 46
4 Classification with Individual and Ensemble Trees in R 51
Esteban Alfaro, Matías Gámez, and Noelia García
4.1 Introduction 51
4.2 adabag: An R Package for Classification with Boosting and Bagging 52
4.2.1 The bagging, predict.bagging, and bagging.cv Functions 56
4.2.2 The boosting, predict.boosting, and boosting.cv Functions 65
4.2.3 The margins, plot.margins, errorevol and plot.errorevol Functions 71
4.2.4 The MarginOrderedPruning.Bagging Function 75
4.3 The “German Credit” Example 79
4.3.1 Classification Tree 81
4.3.2 Combination using Bagging 85
4.3.3 Combination using Boosting 88
4.3.4 Combination using Random Forest 90
4.3.5 Cross-Validation Comparison 95
5 Bankruptcy Prediction Through Ensemble Trees 97
Esteban Alfaro, Matías Gámez, and Noelia García
5.1 Introduction 97
5.2 Problem Description 97
5.3 Applications 99
5.3.1 The Dichotomous Case 99
5.3.2 TheThree-Class Case 111
5.4 Conclusions 117
6 Experiments with Adabag in Biology Classification Tasks 119
M. Fernández-Delgado, E. Cernadas, and M. Pérez-Ortiz
6.1 Classification of Color Texture Feature Patterns Extracted From Cells in Histological Images of Fish Ovary 119
6.2 Direct Kernel Perceptron: Ultra-Fast Kernel ELM-Based Classification with Non-Iterative Closed-Form Weight Calculation 122
6.3 Do We Need Hundreds of Classifiers to Solve Real-World Classification Problems? 125
6.4 On the use of Nominal and Ordinal Classifiers for the Discrimination of Stages of Development in Fish Oocytes 129
7 Generalization Bounds for Ranking Algorithms 135
W. Rejchel
7.1 Introduction 135
7.2 Assumptions, Main Theorem, and Application 136
7.3 Experiments 138
7.4 Conclusions 139
8 Classification and Regression Trees for Analyzing Irrigation Decisions 141
S. Andriyas andM.McKee
8.1 Introduction 141
8.2 Theory 143
8.3 Case Study and Methods 144
8.3.1 Study Site and Data Available 144
8.3.2 Model, Specifications, and Performance Evaluation 146
8.4 Results and Discussion 147
8.5 Conclusions 153
9 Boosted Rule Learner and its Properties 155
M. Kubus
9.1 Introduction 155
9.2 Separate-and-Conquer 156
9.3 Boosting in Rule Induction 157
9.4 Experiments 158
9.5 Conclusions 161
10 Credit Scoring with Individuals and Ensemble Trees 163
M. Chrzanowska, E. Alfaro, and D.Witkowska
10.1 Introduction 163
10.2 Measures of Accuracy 164
10.3 Data Description 165
10.4 Classification of Borrowers Applying Ensemble Trees 168
10.5 Conclusions 173
11 An Overview of Multiple Classifier Systems Based on Generalized Additive Models 175
K.W. De Bock, K. Coussement, and D. Cielen
11.1 Introduction 175
11.2 Multiple Classifier Systems Based on GAMs 176
11.2.1 Generalized AdditiveModels 176
11.2.2 GAM-Based Multiple Classifier Systems 177
11.2.3 GAMensPlus: Extending GAMens for Advanced Interpretability 179
11.3 Experiments and Applications 180
11.3.1 A Multi-Domain Benchmark Study of GAM-Based Ensemble Classifiers 180
11.3.2 Benchmarking GAM-Based Ensemble Classifiers in Predictive Customer Analytics 181
11.3.3 A Case Study of GAMensPlus used for Customer Churn Prediction in Financial Services 183
11.4 Software Implementation in R: the GAMens Package 185
11.5 Conclusions 185
References 187
Index 197