Limit search to available items
Book Cover
E-book
Author Bouveyron, Charles, 1979- author.

Title Model-based clustering and classification for data science : with applications in R / Charles Bouveyron, Gilles Celeux, T. Brendan Murphy, Adrian E. Raftery
Published Cambridge : Cambridge University Press, 2019

Copies

Description 1 online resource (xvii, 427 pages)
Series Cambridge series in statistical and probabilistic mathematics ; 50
Cambridge series on statistical and probabilistic mathematics ; 50.
Contents Cover; Half-title; Series information; Title page; Copyright information; Dedication; Contents; Expanded Contents; Preface; 1 Introduction; 1.1 Cluster Analysis; 1.1.1 From Grouping to Clustering; 1.1.2 Model-based Clustering; 1.2 Classification; 1.2.1 From Taxonomy to Machine Learning; 1.2.2 Model-based Discriminant Analysis; 1.3 Examples; 1.4 Software; 1.5 Organization of the Book; 1.6 Bibliographic Notes; 2 Model-based Clustering: Basic Ideas; 2.1 Finite Mixture Models; 2.2 Geometrically Constrained Multivariate Normal Mixture Models; 2.3 Estimation by Maximum Likelihood
2.4 Initializing the EM Algorithm2.4.1 Initialization by Hierarchical Model-based Clustering; 2.4.2 Initialization Using the smallEM Strategy; 2.5 Examples with Known Number of Clusters; 2.6 Choosing the Number of Clusters and the Clustering Model; 2.7 Illustrative Analyses; 2.7.1 Wine Varieties; 2.7.2 Craniometric Analysis; 2.8 Who Invented Model-based Clustering?; 2.9 Bibliographic Notes; 3 Dealing with Difficulties; 3.1 Outliers; 3.1.1 Outliers in Model-based Clustering; 3.1.2 Mixture Modeling with a Uniform Component for Outliers; 3.1.3 Trimming Data with tclust
3.2 Dealing with Degeneracies: Bayesian Regularization3.3 Non-Gaussian Mixture Components and Merging; 3.4 Bibliographic Notes; 4 Model-based Classification; 4.1 Classification in the Probabilistic Framework; 4.1.1 Generative or Predictive Approach; 4.1.2 An Introductory Example; 4.2 Parameter Estimation; 4.3 Parsimonious Classification Models; 4.3.1 Gaussian Classification with EDDA; 4.3.2 Regularized Discriminant Analysis; 4.4 Multinomial Classification; 4.4.1 The Conditional Independence Model; 4.4.2 An Illustration; 4.5 Variable Selection; 4.6 Mixture Discriminant Analysis
4.7 Model Assessment and Selection4.7.1 The Cross-validated Error Rate; 4.7.2 Model Selection and Assessing the Error Rate; 4.7.3 Penalized Log-likelihood Criteria; 5 Semi-supervised Clustering and Classification; 5.1 Semi-supervised Classification; 5.1.1 Estimating the Model Parameters through the EM Algorithm; 5.1.2 A First Experimental Comparison; 5.1.3 Model Selection Criteria for Semi-supervised Classification; 5.2 Semi-supervised Clustering; 5.2.1 Incorporating Must-link Constraints; 5.2.2 Incorporating Cannot-link Constraints; 5.3 Supervised Classification with Uncertain Labels
5.3.1 The Label Noise Problem5.3.2 A Model-based Approach for the Binary Case; 5.3.3 A Model-based Approach for the Multi-class Case; 5.4 Novelty Detection: Supervised Classification with Unobserved Classes; 5.4.1 A Transductive Model-based Approach; 5.4.2 An Inductive Model-based Approach; 5.5 Bibliographic Notes; 6 Discrete Data Clustering; 6.1 Example Data; 6.2 The Latent Class Model for Categorical Data; 6.2.1 Maximum Likelihood Estimation; 6.2.2 Parsimonious Latent Class Models; 6.2.3 The Latent Class Model as a Cluster Analysis Tool; 6.2.4 Model Selection
Summary Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics
Bibliography Includes bibliographical references (pages 386-414) and index
Notes Vendor-supplied metadata
Subject Cluster analysis.
Mathematical statistics.
Statistics -- Classification
R (Computer program language)
Cluster analysis
Mathematical statistics
R (Computer program language)
Statistics
Genre/Form Classification
Form Electronic book
Author Celeux, Gilles, author
Murphy, T. Brendan, 1972- author.
Raftery, Adrian E., author
ISBN 9781108644181
110864418X