Request It Save to My Lists Export Return to Browse

Previous Record Next Record

Book Cover

Book

Author

Mohri, Mehryar, author

Title Foundations of machine learning / Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar

Edition Second edition

Published CAMBRIDGE : MIT PRESS, 2018

Cambridge, Massachusetts : The MIT Press, [2018]

©2018

Copies

Location Call no. Vol. Availability

MELB 006.31 Moh/Fom DUE 03-05-24

Description xv, 486 pages ; 24 cm

Series Adaptive computation and machine learning

Contents Machine generated contents note: 1.Introduction -- 1.1.What is machine learning? -- 1.2.What kind of problems can be tackled using machine learning? -- 1.3.Some standard learning tasks -- 1.4.Learning stages -- 1.5.Learning scenarios -- 1.6.Generalization -- 2.The PAC Learning Framework -- 2.1.The PAC learning model -- 2.2.Guarantees for finite hypothesis sets --- consistent case -- 2.3.Guarantees for finite hypothesis sets --- inconsistent case -- 2.4.Generalities -- 2.4.1.Deterministic versus stochastic scenarios -- 2.4.2.Bayes error and noise -- 2.5.Chapter notes -- 2.6.Exercises -- 3.Rademacher Complexity and VC-Dimension -- 3.1.Rademacher complexity -- 3.2.Growth function -- 3.3.VC-dimension -- 3.4.Lower bounds -- 3.5.Chapter notes -- 3.6.Exercises -- 4.Model Selection -- 4.1.Estimation and approximation errors -- 4.2.Empirical risk minimization (ERM) -- 4.3.Structural risk minimization (SRM) -- 4.4.Cross-validation -- 4.5.n-Fold cross-validation --

Contents note continued: 4.6.Regularization-based algorithms -- 4.7.Convex surrogate losses -- 4.8.Chapter notes -- 4.9.Exercises -- 5.Support Vector Machines -- 5.1.Linear classification -- 5.2.Separable case -- 5.2.1.Primal optimization problem -- 5.2.2.Support vectors -- 5.2.3.Dual optimization problem -- 5.2.4.Leave-one-out analysis -- 5.3.Non-separable case -- 5.3.1.Primal optimization problem -- 5.3.2.Support vectors -- 5.3.3.Dual optimization problem -- 5.4.Margin theory -- 5.5.Chapter notes -- 5.6.Exercises -- 6.Kernel Methods -- 6.1.Introduction -- 6.2.Positive definite symmetric kernels -- 6.2.1.Definitions -- 6.2.2.Reproducing kernel Hilbert space -- 6.2.3.Properties -- 6.3.Kernel-based algorithms -- 6.3.1.SVMs with PDS kernels -- 6.3.2.Representer theorem -- 6.3.3.Learning guarantees -- 6.4.Negative definite symmetric kernels -- 6.5.Sequence kernels -- 6.5.1.Weighted transducers -- 6.5.2.Rational kernels -- 6.6.Approximate kernel feature maps -- 6.7.Chapter notes --

Contents note continued: 6.8.Exercises -- 7.Boosting -- 7.1.Introduction -- 7.2.AdaBoost -- 7.2.1.Bound on the empirical error -- 7.2.2.Relationship with coordinate descent -- 7.2.3.Practical use -- 7.3.Theoretical results -- 7.3.1.VC-dimension-based analysis -- 7.3.2.L1-geometric margin -- 7.3.3.Margin-based analysis -- 7.3.4.Margin maximization -- 7.3.5.Game-theoretic interpretation -- 7.4.Li-regularization -- 7.5.Discussion -- 7.6.Chapter notes -- 7.7.Exercises -- 8.On-Line Learning -- 8.1.Introduction -- 8.2.Prediction with expert advice -- 8.2.1.Mistake bounds and Halving algorithm -- 8.2.2.Weighted majority algorithm -- 8.2.3.Randomized weighted majority algorithm -- 8.2.4.Exponential weighted average algorithm -- 8.3.Linear classification -- 8.3.1.Perceptron algorithm -- 8.3.2.Winnow algorithm -- 8.4.On-line to batch conversion -- 8.5.Game-theoretic connection -- 8.6.Chapter notes -- 8.7.Exercises -- 9.Multi-Class Classification -- 9.1.Multi-class classification problem --

Contents note continued: 9.2.Generalization bounds -- 9.3.Uncombined multi-class algorithms -- 9.3.1.Multi-class SVMs -- 9.3.2.Multi-class boosting algorithms -- 9.3.3.Decision trees -- 9.4.Aggregated multi-class algorithms -- 9.4.1.One-versus-all -- 9.4.2.One-versus-one -- 9.4.3.Error-correcting output codes -- 9.5.Structured prediction algorithms -- 9.6.Chapter notes -- 9.7.Exercises -- 10.Ranking -- 10.1.The problem of ranking -- 10.2.Generalization bound -- 10.3.Ranking with SVMs -- 10.4.RankBoost -- 10.4.1.Bound on the empirical error -- 10.4.2.Relationship with coordinate descent -- 10.4.3.Margin bound for ensemble methods in ranking -- 10.5.Bipartite ranking -- 10.5.1.Boosting in bipartite ranking -- 10.5.2.Area under the ROC curve -- 10.6.Preference-based setting -- 10.6.1.Second-stage ranking problem -- 10.6.2.Deterministic algorithm -- 10.6.3.Randomized algorithm -- 10.6.4.Extension to other loss functions -- 10.7.Other ranking criteria -- 10.8.Chapter notes --

Contents note continued: 10.9.Exercises -- 11.Regression -- 11.1.The problem of regression -- 11.2.Generalization bounds -- 11.2.1.Finite hypothesis sets -- 11.2.2.Rademacher complexity bounds -- 11.2.3.Pseudo-dimension bounds -- 11.3.Regression algorithms -- 11.3.1.Linear regression -- 11.3.2.Kernel ridge regression -- 11.3.3.Support vector regression -- 11.3.4.Lasso -- 11.3.5.Group norm regression algorithms -- 11.3.6.On-line regression algorithms -- 11.4.Chapter notes -- 11.5.Exercises -- 12.Maximum Entropy Models -- 12.1.Density estimation problem -- 12.1.1.Maximum Likelihood (ML) solution -- 12.1.2.Maximum a Posteriori (MAP) solution -- 12.2.Density estimation problem augmented with features -- 12.3.Maxent principle -- 12.4.Maxent models -- 12.5.Dual problem -- 12.6.Generalization bound -- 12.7.Coordinate descent algorithm -- 12.8.Extensions -- 12.9.L2-regularization -- 12.10.Chapter notes -- 12.11.Exercises -- 13.Conditional Maximum Entropy Models --

Contents note continued: 13.1.Learning problem -- 13.2.Conditional Maxent principle -- 13.3.Conditional Maxent models -- 13.4.Dual problem -- 13.5.Properties -- 13.5.1.Optimization problem -- 13.5.2.Feature vectors -- 13.5.3.Prediction -- 13.6.Generalization bounds -- 13.7.Logistic regression -- 13.7.1.Optimization problem -- 13.7.2.Logistic model -- 13.8.L2-regularization -- 13.9.Proof of the duality theorem -- 13.10.Chapter notes -- 13.11.Exercises -- 14.Algorithmic Stability -- 14.1.Definitions -- 14.2.Stability-based generalization guarantee -- 14.3.Stability of kernel-based regularization algorithms -- 14.3.1.Application to regression algorithms: SVR and KRR -- 14.3.2.Application to classification algorithms: SVMs -- 14.3.3.Discussion -- 14.4.Chapter notes -- 14.5.Exercises -- 15.Dimensionality Reduction -- 15.1.Principal component analysis -- 15.2.Kernel principal component analysis (KPCA) -- 15.3.KPCA and manifold learning -- 15.3.1.Isomap -- 15.3.2.Laplacian eigenmaps --

Contents note continued: 15.3.3.Locally linear embedding (LLE) -- 15.4.Johnson-Lindenstrauss lemma -- 15.5.Chapter notes -- 15.6.Exercises -- 16.Learning Automata and Languages -- 16.1.Introduction -- 16.2.Finite automata -- 16.3.Efficient exact learning -- 16.3.1.Passive learning -- 16.3.2.Learning with queries -- 16.3.3.Learning automata with queries -- 16.4.Identification in the limit -- 16.4.1.Learning reversible automata -- 16.5.Chapter notes -- 16.6.Exercises -- 17.Reinforcement Learning -- 17.1.Learning scenario -- 17.2.Markov decision process model -- 17.3.Policy -- 17.3.1.Definition -- 17.3.2.Policy value -- 17.3.3.Optimal policies -- 17.3.4.Policy evaluation -- 17.4.Planning algorithms -- 17.4.1.Value iteration -- 17.4.2.Policy iteration -- 17.4.3.Linear programming -- 17.5.Learning algorithms -- 17.5.1.Stochastic approximation -- 17.5.2.TD(0) algorithm -- 17.5.3.Q-learning algorithm -- 17.5.4.SARSA -- 17.5.5.TD(λ) algorithm -- 17.5.6.Large state space --

Contents note continued: 17.6.Chapter notes -- Conclusion -- A.Linear Algebra Review -- A.1.Vectors and norms -- A.1.1.Norms -- A.1.2.Dual norms -- A.1.3.Relationship between norms -- A.2.Matrices -- A.2.1.Matrix norms -- A.2.2.Singular value decomposition -- A.2.3.Symmetric positive semidefinite (SPSD) matrices -- B.Convex Optimization -- B.1.Differentiation and unconstrained optimization -- B.2.Convexity -- B.3.Constrained optimization -- B.4.Fenchel duality -- B.4.1.Subgradients -- B.4.2.Core -- B.4.3.Conjugate functions -- B.5.Chapter notes -- B.6.Exercises -- C.Probability Review -- C.1.Probability -- C.2.Random variables -- C.3.Conditional probability and independence -- C.4.Expectation and Markov's inequality -- C.5.Variance and Chebyshev's inequality -- C.6.Moment-generating functions -- C.7.Exercises -- D.Concentration Inequalities -- D.1.Hoeffding's inequality -- D.2.Sahov's theorem -- D.3.Multiplicative Chernoff bounds --

Contents note continued: D.4.Binomial distribution tails: Upper bounds -- D.5.Binomial distribution tails: Lower bound -- D.6.Azuma's inequality -- D.7.McDiarmid's inequality -- D.8.Normal distribution tails: Lower bound -- D.9.Khintchine-Kahane inequality -- D.10.Maximal inequality -- D.11.Chapter notes -- D.12.Exercises -- E.Notions of Information Theory -- E.1.Entropy -- E.2.Relative entropy -- E.3.Mutual information -- E.4.Bregman divergences -- E.5.Chapter notes -- E.6.Exercises -- F.Notation

Summary This book is a general introduction to machine learning that can serve as a textbook for graduate students and a reference for researchers. It covers fundamental modern topics in machine learning while providing the theoretical basis and conceptual tools needed for the discussion and justification of algorithms. It also describes several key aspects of the application of these algorithms. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics. Foundations of Machine Learning is unique in its focus on the analysis and theory of algorithms. The first four chapters lay the theoretical foundation for what follows; subsequent chapters are mostly self-contained. Topics covered include the Probably Approximately Correct (PAC) learning framework; generalization bounds based on Rademacher complexity and VC-dimension; Support Vector Machines (SVMs); kernel methods; boosting; on-line learning; multi-class classification; ranking; regression; algorithmic stability; dimensionality reduction; learning automata and languages; and reinforcement learning. Each chapter ends with a set of exercises. Appendixes provide additional material including concise probability review. This second edition offers three new chapters, on model selection, maximum entropy models, and conditional entropy models. New material in the appendixes includes a major section on Fenchel duality, expanded coverage of concentration inequalities, and an entirely new entry on information theory. More than half of the exercises are new to this edition. -- Provided by publisher

Bibliography Includes bibliographical references and index

Subject Machine learning

Computer algorithms

Author Rostamizadeh, Afshin, author

Talwalkar, Ameet, author

LC no. 2018022812

ISBN 9780262039406 hardcover alkaline paper

0262039400 hardcover alkaline paper

Permalink