Limit search to available items
Book Cover
E-book
Author Fernández, Alberto, author

Title Learning from imbalanced data sets / Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C. Prati, Bartosz Krawczyk and Francisco Herrera
Published Cham, Switzerland : Springer, [2018]
Online access available from:
Springer eBooks    View Resource Record  

Copies

Description 1 online resource (xviii, 377 pages)
Contents Intro; Preface; Contents; Acronyms; 1 Introduction to KDD and Data Science; 1.1 Introduction; 1.2 A Definition of Data Science; 1.3 The Data Science Process; 1.3.1 Selection of the Data; 1.3.2 Data Preprocessing; 1.3.2.1 Why Is Preprocessing Required?; 1.3.3 Stages of the Data Preprocessing Phase; 1.3.3.1 Selection of Data; 1.3.3.2 Exploration of Data; 1.3.3.3 Transformation of Data; 1.4 Standard Data Science Problems; 1.4.1 Descriptive Problems; 1.4.2 Predictive Problems; 1.5 Classical Data Mining Techniques; 1.6 Non-standard Data Science Problems; 1.6.1 Derivative Problems
1.6.1.1 Imbalanced Learning1.6.1.2 Multi-instance Learning; 1.6.1.3 Multi-label Classification; 1.6.1.4 Data Stream Learning; 1.6.2 Hybrid Problems; 1.6.2.1 Semi-supervised Learning; 1.6.2.2 Subgroup Discovery; 1.6.2.3 Ordinal Classification/Regression; 1.6.2.4 Transfer Learning; References; 2 Foundations on Imbalanced Classification; 2.1 Formal Description; 2.2 Applications; 2.2.1 Engineering; 2.2.2 Information Technology; 2.2.3 Bioinformatics; 2.2.4 Medicine; 2.2.4.1 Quality Control; 2.2.4.2 Medical Diagnosis; 2.2.4.3 Medical Prognosis; 2.2.5 Business Management; 2.2.6 Security
2.2.7 Education2.3 Case Studies on Imbalanced Classification; References; 3 Performance Measures; 3.1 Introduction; 3.2 Nominal Class Predictions; 3.3 Scoring Predictions; 3.4 Probabilistic Predictions; 3.5 Summarizing Comments; References; 4 Cost-Sensitive Learning; 4.1 Introduction; 4.2 Obtaining the Cost Matrix; 4.3 MetaCost; 4.4 Cost-Sensitive Decision Trees; 4.4.1 Direct Approach with Cost-Sensitive Splitting; 4.4.2 Meta-learning Approach with Instance Weighting; 4.5 Other Cost-Sensitive Classifiers; 4.5.1 Support Vector Machines; 4.5.2 Artificial Neural Networks; 4.5.3 Nearest Neighbors
4.6 Hybrid Cost-Sensitive Approaches4.7 Summarizing Comments; References; 5 Data Level Preprocessing Methods; 5.1 Introduction; 5.2 Undersampling and Oversampling Basics; 5.3 Advanced Undersampling Techniques; 5.3.1 Evolutionary Undersampling; 5.3.1.1 ACOSampling; 5.3.1.2 IPADE-ID; 5.3.1.3 CBEUS: Cluster-Based Evolutionary Undersampling; 5.3.2 Undersampling by Cleaning Data; 5.3.2.1 Weighted Sampling; 5.3.2.2 IHT: Instance Hardness Threshold; 5.3.2.3 Hybrid Undersampling; 5.3.3 Ensemble Based Undersampling; 5.3.3.1 IRUS: Inverse Random Undersampling
5.3.3.2 OligoIS: Oligarchic Instance Selection5.3.4 Clustering Based Undersampling; 5.3.4.1 ClusterOSS; 5.3.4.2 DSUS: Diversified Sensitivity Undersampling; 5.4 Synthetic Minority Oversampling TEchnique (SMOTE); 5.5 Extensions of SMOTE; 5.5.1 Borderline-SMOTE; 5.5.2 Adjusting the Direction of the Synthetic Minority ClasS Examples: ADOMS; 5.5.3 ADASYN: Adaptive Synthetic Sampling Approach; Input; Procedure; 5.5.4 ROSE: Random Oversampling Examples; 5.5.5 Safe-Level-SMOTE; 5.5.6 DBSMOTE: Density-Based SMOTE; 5.5.7 MWMOTE: Majority Weighted Minority Oversampling TEchnique; Input; Procedure
Summary This book provides a general and comprehensible overview of imbalanced learning. It contains a formal description of a problem, and focuses on its main features, and the most relevant proposed solutions. Additionally, it considers the different scenarios in Data Science for which the imbalanced classification can create a real challenge. This book stresses the gap with standard classification tasks by reviewing the case studies and ad-hoc performance metrics that are applied in this area. It also covers the different approaches that have been traditionally applied to address the binary skewed class distribution. Specifically, it reviews cost-sensitive learning, data-level preprocessing methods and algorithm-level solutions, taking also into account those ensemble-learning solutions that embed any of the former alternatives. Furthermore, it focuses on the extension of the problem for multi-class problems, where the former classical methods are no longer to be applied in a straightforward way. This book also focuses on the data intrinsic characteristics that are the main causes which, added to the uneven class distribution, truly hinders the performance of classification algorithms in this scenario. Then, some notes on data reduction are provided in order to understand the advantages related to the use of this type of approaches. Finally this book introduces some novel areas of study that are gathering a deeper attention on the imbalanced data issue. Specifically, it considers the classification of data streams, non-classical classification problems, and the scalability related to Big Data. Examples of software libraries and modules to address imbalanced classification are provided. This book is highly suitable for technical professionals, senior undergraduate and graduate students in the areas of data science, computer science and engineering. It will also be useful for scientists and researchers to gain insight on the current developments in this area of study, as well as future research directions
Bibliography Includes bibliographical references
Notes Online resource; title from PDF title page (EBSCO, viewed October 31, 2018)
Subject Artificial intelligence -- Data processing.
Big data.
Machine learning.
Form Electronic book
Author Galar, Mikel, author
García, Salvador, author
Herrera, Francisco, author
Krawczyk, Bartosz, author
Prati, Ronaldo C., author
ISBN 3319980742 (electronic bk.)
3319980750
9783319980744 (electronic bk.)
9783319980751 (print)