The book elaborates in detail on the current needs of data mining and machine learning and promotes mutual understanding among research in different disciplines, thus facilitating research development and collaboration.
Data, the latest currency of today’s world, is the new gold. In this new form of gold, the most beautiful jewels are data analytics and machine learning. Data mining and machine learning are considered interdisciplinary fields. Data mining is a subset of data analytics and machine learning involves the use of algorithms that automatically improve through experience based on data.
Massive datasets can be classified and clustered to obtain accurate results. The most common technologies used include classification and clustering methods. Accuracy and error rates are calculated for regression and classification and clustering to find actual results through algorithms like support vector machines and neural networks with forward and backward propagation. Applications include fraud detection, image processing, medical diagnosis, weather prediction, e-commerce and so forth.
The book features: - A review of the state-of-the-art in data mining and machine learning, - A review and description of the learning methods in human-computer interaction, - Implementation strategies and future research directions used to meet the design and application requirements of several modern and real-time applications for a long time, - The scope and implementation of a majority of data mining and machine learning strategies. - A discussion of real-time problems.
Audience
Industry and academic researchers, scientists, and engineers in information technology, data science and machine and deep learning, as well as artificial intelligence more broadly.
Table of Contents
Preface xvii
1 Introduction to Data Mining 1
Santosh R. Durugkar, Rohit Raja, Kapil Kumar Nagwanshi and Sandeep Kumar
1.1. Introduction 1
1.1.1 Data Mining 1
1.2 Knowledge Discovery in Database (KDD) 2
1.2.1 Importance of Data Mining 3
1.2.2 Applications of Data Mining 3
1.2.3 Databases 4
1.3 Issues in Data Mining 6
1.4 Data Mining Algorithms 7
1.5 Data Warehouse 9
1.6 Data Mining Techniques 10
1.7 Data Mining Tools 11
1.7.1 Python for Data Mining 12
1.7.2 KNIME 13
1.7.3 Rapid Miner 17
References 18
2 Classification and Mining Behavior of Data 21
Srinivas Konda, Kavitarani Balmuri and Kishore Kumar Mamidala
2.1 Introduction 22
2.2 Main Characteristics of Mining Behavioral Data 23
2.2.1 Mining Dynamic/Streaming Data 23
2.2.2 Mining Graph & Network Data 24
2.2.3 Mining Heterogeneous/Multi-Source Information 25
2.2.3.1 Multi-Source and Multidimensional Information 26
2.2.3.2 Multi-Relational Data 26
2.2.3.3 Background and Connected Data 27
2.2.3.4 Complex Data, Sequences, and Events 27
2.2.3.5 Data Protection and Morals 27
2.2.4 Mining High Dimensional Data 28
2.2.5 Mining Imbalanced Data 29
2.2.5.1 The Class Imbalance Issue 29
2.2.6 Mining Multimedia Data 30
2.2.6.1 Common Applications Multimedia Data Mining 31
2.2.6.2 Multimedia Data Mining Utilizations 31
2.2.6.3 Multimedia Database Management 32
2.2.7 Mining Scientific Data 34
2.2.8 Mining Sequential Data 35
2.2.9 Mining Social Networks 36
2.2.9.1 Social-Media Data Mining Reasons 39
2.2.10 Mining Spatial and Temporal Data 40
2.2.10.1 Utilizations of Spatial and Temporal Data Mining 41
2.3 Research Method 44
2.4 Results 48
2.5 Discussion 49
2.6 Conclusion 50
References 51
3 A Comparative Overview of Hybrid Recommender Systems: Review, Challenges, and Prospects 57
Rakhi Seth and Aakanksha Sharaff
3.1 Introduction 58
3.2 Related Work on Different Recommender System 60
3.2.1 Challenges in RS 65
3.2.2 Research Questions and Architecture of This Paper 66
3.2.3 Background 68
3.2.3.1 The Architecture of Hybrid Approach 69
3.2.4 Analysis 78
3.2.4.1 Evaluation Measures 78
3.2.5 Materials and Methods 81
3.2.6 Comparative Analysis With Traditional Recommender System 85
3.2.7 Practical Implications 85
3.2.8 Conclusion & Future Work 94
References 94
4 Stream Mining: Introduction, Tools & Techniques and Applications 99
Naresh Kumar Nagwani
4.1 Introduction 100
4.2 Data Reduction: Sampling and Sketching 101
4.2.1 Sampling 101
4.2.2 Sketching 102
4.3 Concept Drift 103
4.4 Stream Mining Operations 105
4.4.1 Clustering 105
4.4.2 Classification 106
4.4.3 Outlier Detection 107
4.4.4 Frequent Itemsets Mining 108
4.5 Tools & Techniques 109
4.5.1 Implementation in Java 110
4.5.2 Implementation in Python 116
4.5.3 Implementation in R 118
4.6 Applications 120
4.6.1 Stock Prediction in Share Market 120
4.6.2 Weather Forecasting System 121
4.6.3 Finding Trending News and Events 121
4.6.4 Analyzing User Behavior in Electronic Commerce Site (Click Stream) 121
4.6.5 Pollution Control Systems 122
4.7 Conclusion 122
References 122
5 Data Mining Tools and Techniques: Clustering Analysis 125
Rohit Miri, Amit Kumar Dewangan, S.R. Tandan, Priya Bhatnagar and Hiral Raja
5.1 Introduction 126
5.2 Data Mining Task 129
5.2.1 Data Summarization 129
5.2.2 Data Clustering 129
5.2.3 Classification of Data 129
5.2.4 Data Regression 130
5.2.5 Data Association 130
5.3 Data Mining Algorithms and Methodologies 131
5.3.1 Data Classification Algorithm 131
5.3.2 Predication 132
5.3.3 Association Rule 132
5.3.4 Neural Network 132
5.3.4.1 Data Clustering Algorithm 133
5.3.5 In-Depth Study of Gathering Techniques 134
5.3.6 Data Partitioning Method 134
5.3.7 Hierarchical Method 134
5.3.8 Framework-Based Method 136
5.3.9 Model-Based Method 136
5.3.10 Thickness-Based Method 136
5.4 Clustering the Nearest Neighbor 136
5.4.1 Fuzzy Clustering 137
5.4.2 K-Algorithm Means 137
5.5 Data Mining Applications 138
5.6 Materials and Strategies for Document Clustering 140
5.6.1 Features Generation 142
5.7 Discussion and Results 143
5.7.1 Discussion 146
5.7.2 Conclusion 149
References 149
6 Data Mining Implementation Process 151
Kamal K. Mehta, Rajesh Tiwari and Nishant Behar
6.1 Introduction 151
6.2 Data Mining Historical Trends 152
6.3 Processes of Data Analysis 153
6.3.1 Data Attack 153
6.3.2 Data Mixing 153
6.3.3 Data Collection 153
6.3.4 Data Conversion 154
6.3.4.1 Data Mining 154
6.3.4.2 Design Evaluation 154
6.3.4.3 Data Illustration 154
6.3.4.4 Implementation of Data Mining in the Cross-Industry Standard Process 154
6.3.5 Business Understanding 155
6.3.6 Data Understanding 156
6.3.7 Data Preparation 158
6.3.8 Modeling 159
6.3.9 Evaluation 160
6.3.10 Deployment 161
6.3.11 Contemporary Developments 162
6.3.12 An Assortment of Data Mining 162
6.3.12.1 Using Computational & Connectivity Tools 163
6.3.12.2 Web Mining 163
6.3.12.3 Comparative Statement 163
6.3.13 Advantages of Data Mining 163
6.3.14 Drawbacks of Data Mining 165
6.3.15 Data Mining Applications 165
6.3.16 Methodology 167
6.3.17 Results 169
6.3.18 Conclusion and Future Scope 171
References 172
7 Predictive Analytics in IT Service Management (ITSM) 175
Sharon Christa I.L. and Suma V.
7.1 Introduction 176
7.2 Analytics: An Overview 178
7.2.1 Predictive Analytics 180
7.3 Significance of Predictive Analytics in ITSM 181
7.4 Ticket Analytics: A Case Study 186
7.4.1 Input Parameters 188
7.4.2 Predictive Modeling 188
7.4.3 Random Forest Model 189
7.4.4 Performance of the Predictive Model 191
7.5 Conclusion 191
References 192
8 Modified Cross-Sell Model for Telecom Service Providers Using Data Mining Techniques 195
K. Ramya Laxmi, Sumit Srivastava, K. Madhuravani, S. Pallavi and Omprakash Dewangan
8.1 Introduction 196
8.2 Literature Review 198
8.3 Methodology and Implementation 200
8.3.1 Selection of the Independent Variables 200
8.4 Data Partitioning 203
8.4.1 Interpreting the Results of Logistic Regression Model 203
8.5 Conclusions 204
References 205
9 Inductive Learning Including Decision Tree and Rule Induction Learning 209
Raj Kumar Patra, A. Mahendar and G. Madhukar
9.1 Introduction 210
9.2 The Inductive Learning Algorithm (ILA) 212
9.3 Proposed Algorithms 213
9.4 Divide & Conquer Algorithm 214
9.4.1 Decision Tree 214
9.5 Decision Tree Algorithms 215
9.5.1 ID3 Algorithm 215
9.5.2 Separate and Conquer Algorithm 217
9.5.3 RULE EXTRACTOR-1 226
9.5.4 Inductive Learning Applications 226
9.5.4.1 Education 226
9.5.4.2 Making Credit Decisions 227
9.5.5 Multidimensional Databases and OLAP 228
9.5.6 Fuzzy Choice Trees 228
9.5.7 Fuzzy Choice Tree Development From a Multidimensional Database 229
9.5.8 Execution and Results 230
9.6 Conclusion and Future Work 231
References 232
10 Data Mining for Cyber-Physical Systems 235
M. Varaprasad Rao, D. Anji Reddy, Anusha Ampavathi and Shaik Munawar
10.1 Introduction 236
10.1.1 Models of Cyber-Physical System 238
10.1.2 Statistical Model-Based Methodologies 239
10.1.3 Spatial-and-Transient Closeness-Based Methodologies 240
10.2 Feature Recovering Methodologies 240
10.3 CPS vs. IT Systems 241
10.4 Collections, Sources, and Generations of Big Data for CPS 242
10.4.1 Establishing Conscious Computation and Information Systems 243
10.5 Spatial Prediction 243
10.5.1 Global Optimization 244
10.5.2 Big Data Analysis CPS 245
10.5.3 Analysis of Cloud Data 245
10.5.4 Analysis of Multi-Cloud Data 247
10.6 Clustering of Big Data 248
10.7 NoSQL 251
10.8 Cyber Security and Privacy Big Data 251
10.8.1 Protection of Big Computing and Storage 252
10.8.2 Big Data Analytics Protection 252
10.8.3 Big Data CPS Applications 256
10.9 Smart Grids 256
10.10 Military Applications 258
10.11 City Management 259
10.12 Clinical Applications 261
10.13 Calamity Events 262
10.14 Data Streams Clustering by Sensors 263
10.15 The Flocking Model 263
10.16 Calculation Depiction 264
10.17 Initialization 265
10.18 Representative Maintenance and Clustering 266
10.19 Results 267
10.20 Conclusion 268
References 269
11 Developing Decision Making and Risk Mitigation: Using CRISP-Data Mining 281
Vivek Parganiha, Soorya Prakash Shukla and Lokesh Kumar Sharma
11.1 Introduction 282
11.2 Background 283
11.3 Methodology of CRISP-DM 284
11.4 Stage One - Determine Business Objectives 286
11.4.1 What Are the Ideal Yields of the Venture? 287
11.4.2 Evaluate the Current Circumstance 288
11.4.3 Realizes Data Mining Goals 289
11.5 Stage Two - Data Sympathetic 290
11.5.1 Portray Data 291
11.5.2 Investigate Facts 291
11.5.3 Confirm Data Quality 292
11.5.4 Data Excellence Description 292
11.6 Stage Three - Data Preparation 292
11.6.1 Select Your Data 294
11.6.2 The Data Is Processed 294
11.6.3 Data Needed to Build 294
11.6.4 Combine Information 295
11.7 Stage Four - Modeling 295
11.7.1 Select Displaying Strategy 296
11.7.2 Produce an Investigation Plan 297
11.7.3 Fabricate Ideal 297
11.7.4 Evaluation Model 297
11.8 Stage Five - Evaluation 298
11.8.1 Assess Your Outcomes 299
11.8.2 Survey Measure 299
11.8.3 Decide on the Subsequent Stages 300
11.9 Stage Six - Deployment 300
11.9.1 Plan Arrangement 301
11.9.2 Plan Observing and Support 301
11.9.3 Produce the Last Report 302
11.9.4 Audit Venture 302
11.10 Data on ERP Systems 302
11.11 Usage of CRISP-DM Methodology 304
11.12 Modeling 306
11.12.1 Association Rule Mining (ARM) or Association Analysis 307
11.12.2 Classification Algorithms 307
11.12.3 Regression Algorithms 308
11.12.4 Clustering Algorithms 308
11.13 Assessment 310
11.14 Distribution 310
11.15 Results and Discussion 310
11.16 Conclusion 311
References 314
12 Human-Machine Interaction and Visual Data Mining 317
Upasana Sinha, Akanksha Gupta, Samera Khan, Shilpa Rani and Swati Jain
12.1 Introduction 318
12.2 Related Researches 320
12.2.1 Data Mining 323
12.2.2 Data Visualization 323
12.2.3 Visual Learning 324
12.3 Visual Genes 325
12.4 Visual Hypotheses 326
12.5 Visual Strength and Conditioning 326
12.6 Visual Optimization 327
12.7 The Vis 09 Model 327
12.8 Graphic Monitoring and Contact With Human-Computer 328
12.9 Mining HCI Information Using Inductive Deduction Viewpoint 332
12.10 Visual Data Mining Methodology 334
12.11 Machine Learning Algorithms for Hand Gesture Recognition 338
12.12 Learning 338
12.13 Detection 339
12.14 Recognition 340
12.15 Proposed Methodology for Hand Gesture Recognition 340
12.16 Result 343
12.17 Conclusion 343
References 344
13 MSDTrA: A Boosting Based-Transfer Learning Approach for Class Imbalanced Skin Lesion Dataset for Melanoma Detection 349
Lokesh Singh, Rekh Ram Janghel and Satya Prakash Sahu
13.1 Introduction 349
13.2 Literature Survey 352
13.3 Methods and Material 353
13.3.1 Proposed Methodology: Multi Source Dynamic TrAdaBoost Algorithm 355
13.4 Experimental Results 357
13.5 Libraries Used 357
13.6 Comparing Algorithms Based on Decision Boundaries 357
13.7 Evaluating Results 358
13.8 Conclusion 361
References 361
14 New Algorithms and Technologies for Data Mining 365
Padma Bonde, Latika Pinjarkar, Korhan Cengiz, Aditi Shukla and Maguluri Sudeep Joel
14.1 Introduction 366
14.2 Machine Learning Algorithms 368
14.3 Supervised Learning 368
14.4 Unsupervised Learning 369
14.5 Semi-Supervised Learning 369
14.6 Regression Algorithms 371
14.7 Case-Based Algorithms 371
14.8 Regularization Algorithms 372
14.9 Decision Tree Algorithms 372
14.10 Bayesian Algorithms 373
14.11 Clustering Algorithms 374
14.12 Association Rule Learning Algorithms 375
14.13 Artificial Neural Network Algorithms 375
14.14 Deep Learning Algorithms 376
14.15 Dimensionality Reduction Algorithms 377
14.16 Ensemble Algorithms 377
14.17 Other Machine Learning Algorithms 378
14.18 Data Mining Assignments 378
14.19 Data Mining Models 381
14.20 Non-Parametric & Parametric Models 381
14.21 Flexible vs. Restrictive Methods 382
14.22 Unsupervised vs. Supervised Learning 382
14.23 Data Mining Methods 384
14.24 Proposed Algorithm 387
14.24.1 Organization Formation Procedure 387
14.25 The Regret of Learning Phase 388
14.26 Conclusion 392
References 392
15 Classification of EEG Signals for Detection of Epileptic Seizure Using Restricted Boltzmann Machine Classifier 397
Sudesh Kumar, Rekh Ram Janghel and Satya Prakash Sahu
15.1 Introduction 398
15.2 Related Work 400
15.3 Material and Methods 401
15.3.1 Dataset Description 401
15.3.2 Proposed Methodology 403
15.3.3 Normalization 404
15.3.4 Preprocessing Using PCA 404
15.3.5 Restricted Boltzmann Machine (RBM) 406
15.3.6 Stochastic Binary Units (Bernoulli Variables) 407
15.3.7 Training 408
15.3.7.1 Gibbs Sampling 409
15.3.7.2 Contrastive Divergence (CD) 409
15.4 Experimental Framework 410
15.5 Experimental Results and Discussion 412
15.5.1 Performance Measurement Criteria 412
15.5.2 Experimental Results 412
15.6 Discussion 414
15.7 Conclusion 418
References 419
16 An Enhanced Security of Women and Children Using Machine Learning and Data Mining Techniques 423
Nanda R. Wagh and Sanjay R. Sutar
16.1 Introduction 424
16.2 Related Work 424
16.2.1 WoSApp 424
16.2.2 Abhaya 425
16.2.3 Women Empowerment 425
16.2.4 Nirbhaya 425
16.2.5 Glympse 426
16.2.6 Fightback 426
16.2.7 Versatile-Based 426
16.2.8 RFID 426
16.2.9 Self-Preservation Framework for WomenBWith Area Following and SMS Alarming Through GSM Network 426
16.2.10 Safe: A Women Security Framework 427
16.2.11 Intelligent Safety System For Women Security 427
16.2.12 A Mobile-Based Women Safety Application 427
16.2.13 Self-Salvation - The Women’s Security Module 427
16.3 Issue and Solution 427
16.3.1 Inspiration 427
16.3.2 Issue Statement and Choice of Solution 428
16.4 Selection of Data 428
16.5 Pre-Preparation Data 430
16.5.1 Simulation 431
16.5.2 Assessment 431
16.5.3 Forecast 434
16.6 Application Development 436
16.6.1 Methodology 436
16.6.2 AI Model 437
16.6.3 Innovations Used The Proposed Application Has Utilized After Technologies 437
16.7 Use Case For The Application 437
16.7.1 Application Icon 437
16.7.2 Enlistment Form 438
16.7.3 Login Form 439
16.7.4 Misconduct Place Detector 439
16.7.5 Help Button 440
16.8 Conclusion 443
References 443
17 Conclusion and Future Direction in Data Mining and Machine Learning 447
Santosh R. Durugkar, Rohit Raja, Kapil Kumar Nagwanshi and Ramakant Chandrakar
17.1 Introduction 448
17.2 Machine Learning 451
17.2.1 Neural Network 452
17.2.2 Deep Learning 452
17.2.3 Three Activities for Object Recognition 453
17.3 Conclusion 457
References 457
Index 461