+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

Applied Modeling Techniques and Data Analysis 1. Computational Data Analysis Methods and Tools. Edition No. 1

  • Book

  • 304 Pages
  • May 2021
  • John Wiley and Sons Ltd
  • ID: 5839786
BIG DATA, ARTIFICIAL INTELLIGENCE AND DATA ANALYSIS SET Coordinated by Jacques Janssen

Data analysis is a scientific field that continues to grow enormously, most notably over the last few decades, following rapid growth within the tech industry, as well as the wide applicability of computational techniques alongside new advances in analytic tools. Modeling enables data analysts to identify relationships, make predictions, and to understand, interpret and visualize the extracted information more strategically.

This book includes the most recent advances on this topic, meeting increasing demand from wide circles of the scientific community. Applied Modeling Techniques and Data Analysis 1 is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians, working on the front end of data analysis and modeling applications. The chapters cover a cross section of current concerns and research interests in the above scientific areas. The collected material is divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications.

Table of Contents

Preface xi
Yannis DIMOTIKALIS, Alex KARAGRIGORIOU, Christina PARPOULA and Christos H. SKIADAS

Part 1. Computational Data Analysis 1

Chapter 1. A Variant of Updating PageRank in Evolving Tree Graphs 3
Benard ABOLA, Pitos Seleka BIGANDA, Christopher ENGSTRÖM, John Magero MANGO, Godwin KAKUBA and Sergei SILVESTROV

1.1. Introduction 3

1.2. Notations and definitions 5

1.3. Updating the transition matrix 5

1.4. Updating the PageRank of a tree graph 10

1.4.1. Updating the PageRank of tree graph when a batch of edges changes 12

1.4.2. An example of updating the PageRank of a tree 15

1.5. Maintaining the levels of vertices in a changing tree graph 17

1.6. Conclusion 21

1.7. Acknowledgments 21

1.8. References 21

Chapter 2. Nonlinearly Perturbed Markov Chains and Information Networks 23
Benard ABOLA, Pitos Seleka BIGANDA, Sergei SILVESTROV, Dmitrii SILVESTROV, Christopher ENGSTRÖM, John Magero MANGO and Godwin KAKUBA

2.1. Introduction 23

2.2. Stationary distributions for Markov chains with damping component 26

2.2.1. Stationary distributions for Markov chains with damping component 26

2.2.2. The stationary distribution of the Markov chain X0,n 28

2.3. A perturbation analysis for stationary distributions of Markov chains with damping component 29

2.3.1. Continuity property for stationary probabilities 29

2.3.2. Rate of convergence for stationary distributions 29

2.3.3. Asymptotic expansions for stationary distributions 30

2.3.4. Results of numerical experiments 32

2.4. Coupling and ergodic theorems for perturbed Markov chains with damping component 39

2.4.1. Coupling for regularly perturbed Markov chains with damping component 39

2.4.2. Coupling for singularly perturbed Markov chains with damping component 41

2.4.3. Ergodic theorems for perturbed Markov chains with damping component in the triangular array mode 42

2.4.4. Numerical examples 43

2.5. Acknowledgments 51

2.6. References 51

Chapter 3. PageRank and Perturbed Markov Chains 57
Pitos Seleka BIGANDA, Benard ABOLA, Christopher ENGSTRÖM, Sergei SILVESTROV, Godwin KAKUBA and John Magero MANGO

3.1. Introduction 57

3.2. PageRank of the first-order perturbed Markov chain 59

3.3. PageRank of the second-order perturbed Markov chain 60

3.4. Rates of convergence of Page Ranks of first- and second-order perturbed Markovchains 70

3.5. Conclusion 72

3.6. Acknowledgments 72

3.7. References 72

Chapter 4. Doubly Robust Data-driven Distributionally Robust Optimization 75
Jose BLANCHET, Yang KANG, Fan ZHANG, Fei HE and Zhangyi HU

4.1. Introduction 75

4.2. DD-DRO, optimal transport and supervised machine learning 79

4.2.1. Optimal transport distances and discrepancies 80

4.3. Data-driven selection of optimal transport cost function 81

4.3.1. Data-driven cost functions via metric learning procedures 81

4.4. Robust optimization for metric learning 83

4.4.1. Robust optimization for relative metric learning 83

4.4.2. Robust optimization for absolute metric learning 86

4.5. Numerical experiments 88

4.6. Discussion and conclusion 89

4.7. References 89

Chapter 5. A Comparison of Graph Centrality Measures Based on Lazy Random Walks 91
Collins ANGUZU, Christopher ENGSTRÖM and Sergei SILVESTROV

5.1. Introduction 91

5.1.1. Notations and abbreviations 93

5.1.2. Linear systems and the Neumann series 94

5.2. Review on some centrality measures 95

5.2.1. Degree centrality 95

5.2.2. Katz status and β-centralities 95

5.2.3. Eigenvector and cumulative nomination centralities 96

5.2.4. Alpha centrality 97

5.2.5. PageRank centrality 98

5.2.6. Summary of the centrality measures as steady state, shifted and power series 99

5.3. Generalizations of centrality measures 99

5.3.1. Priors to centrality measures 99

5.3.2. Lazy variants of centrality measures 100

5.3.3. Lazy α-centrality 100

5.3.4. Lazy Katz centrality 102

5.3.5. Lazy cumulative nomination centrality 103

5.4. Experimental results 104

5.5. Discussion 106

5.6. Conclusion 109

5.7. Acknowledgments 109

5.8. References 110

Chapter 6. Error Detection in Sequential Laser Sensor Input 113
Gwenael GATTO and Olympia HADJILIADIS

6.1. Introduction 113

6.2. Data description 114

6.3. Algorithms 116

6.3.1. Algorithm for consecutive changes in mean 118

6.3.2. Algorithm for burst detection 120

6.4. Results 125

6.5. Acknowledgments 127

6.6. References 127

Chapter 7. Diagnostics and Visualization of Point Process Models for Event Times on a Social Network 129
Jing WU, Anna L. SMITH and Tian ZHENG

7.1. Introduction 129

7.2. Background 131

7.2.1. Univariate point processes 131

7.2.2. Network point processes 132

7.3. Model checking for time heterogeneity 134

7.3.1. Time rescaling theorem 134

7.3.2. Residual process 136

7.4. Model checking for network heterogeneity and structure 138

7.4.1. Kolmogorov-Smirnov test 138

7.4.2. Structure score based on the Pearson residual matrix 141

7.5. Summary 143

7.6. Acknowledgments 144

7.7. References 144

Part 2. Data Analysis Methods and Tools 147

Chapter 8. Exploring the Distribution of Conditional Quantile Estimates: An Application to Specific Costs of Pig Production in the European Union 149
Dominique DESBOIS

8.1. Introduction 150

8.2. Conceptual framework and methodological aspects 150

8.2.1. The empirical model for estimating the specific production costs 151

8.2.2. The procedures for estimating and testing conditional quantiles 152

8.2.3. Symbolic PCA of the specific cost distributions 154

8.2.4. Symbolic clustering analysis of the specific cost distributions 162

8.3. Results 165

8.3.1. The SO-PCA of specific cost estimates 167

8.3.2. The divisive hierarchy of specific cost estimates 170

8.4. Conclusion 171

8.5. References 172

Chapter 9. Maximization Problem Subject to Constraint of Availability in Semi-Markov Model of Operation 175
Franciszek GRABSKI

9.1. Introduction 175

9.2. Semi-Markov decision process 176

9.3. Semi-Markov decision model of operation 177

9.3.1. Description and assumptions 177

9.3.2. Model construction 177

9.4. Optimization problem 178

9.4.1. Linear programming method 179

9.5. Numerical example 182

9.6. Conclusion 184

9.7. References 185

Chapter 10. The Impact of Multicollinearity on Big Data Multivariate Analysis Modeling 187
Kimon NTOTSIS and Alex KARAGRIGORIOU

10.1. Introduction 187

10.2. Multicollinearity 188

10.3. Dimension reduction techniques 191

10.3.1. Beale et al 192

10.3.2. Principal component analysis 192

10.4. Application 194

10.4.1. The modeling of PPE 194

10.4.2. Concluding remarks 200

10.5. Acknowledgments 200

10.6. References 200

Chapter 11. Weak Signals in High-Dimensional Poisson Regression Models 203
Orawan REANGSEPHET, Supranee LISAWADI and Syed Ejaz AHMED

11.1. Introduction 203

11.2. Statistical background 204

11.3. Methodologies 205

11.3.1. Predictor screening methods 205

11.3.2. Post-screening parameter estimation methods 206

11.4. Numerical studies 208

11.4.1. Simulation settings and performance criteria 208

11.4.2. Results 209

11.5. Conclusion 217

11.6. Acknowledgments 218

11.7. References 218

Chapter 12. Groundwater Level Forecasting for Water Resource Management 221
Andrea ZIRULIA, Alessio BARBAGLI and Enrico GUASTALDI

12.1. Introduction 221

12.2. Materials and methods 222

12.2.1. Study area 222

12.2.2. Forecast method 222

12.3. Results 224

12.4. Conclusion 230

12.5. References 230

Chapter 13. Phase I Non-parametric Control Charts for Individual Observations: A Selective Review and Some Results 233
Christina PARPOULA

13.1. Introduction 234

13.1.1. Background 234

13.1.2. Univariate non-parametric process monitoring 235

13.2. Problem formulation 237

13.3. A comparative study 239

13.3.1. The existing methodologies 239

13.3.2. Simulation settings 240

13.3.3. Simulation-study results 242

13.4. Concluding remarks 247

13.5. References 247

Chapter 14. On Divergence and Dissimilarity Measures for Multiple Time Series 249
Konstantinos MAKRIS, Alex KARAGRIGORIOU and Ilia VONTA

14.1. Introduction 249

14.2. Classical measures 250

14.3. Divergence measures 252

14.4. Dissimilarity measures for ordered data 254

14.4.1. Standard dissimilarity measures 254

14.4.2. Advanced dissimilarity measures 256

14.5. Conclusion 259

14.6. References 259

List of Authors 261

Index 265

Authors

Yiannis Dimotikalis Alex Karagrigoriou Christina Parpoula Christos H. Skiadas