Limit search to available items
Book Cover
E-book
Author Testas, Abdelaziz, author

Title Distributed machine learning with Pyspark : migrating effortlessly from Pandas and Scikit-Learn / Abdelaziz Testas
Published Berkeley, CA : Apress L. P., [2023]

Copies

Description 1 online resource (500 p.)
Contents Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: An Easy Transition -- PySpark and Pandas Integration -- Similarity in Syntax -- Loading Data -- Selecting Columns -- Aggregating Data -- Filtering Data -- Joining Data -- Saving Data -- Modeling Steps -- Pipelines -- Summary -- Chapter 2: Selecting Algorithms -- The Dataset -- Selecting Algorithms with Cross-Validation -- Scikit-Learn -- PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary
Chapter 3: Multiple Linear Regression with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Multiple Linear Regression -- Multiple Linear Regression with Scikit-Learn -- Multiple Linear Regression with PySpark -- Summary -- Chapter 4: Decision Tree Regression with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Decision Tree Regression -- Decision Tree Regression with Scikit-Learn -- The Modeling Steps -- Decision Tree Regression with PySpark -- The Modeling Steps -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary
Chapter 5: Random Forest Regression with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Random Forest Regression -- Random Forest with Scikit-Learn -- Random Forest with PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 6: Gradient-Boosted Tree Regression with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Gradient-Boosted Tree (GBT) Regression -- GBT with Scikit-Learn -- GBT with PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 7: Logistic Regression with Pandas, Scikit-Learn, and PySpark -- The Dataset
Logistic Regression -- Logistic Regression with Scikit-Learn -- Logistic Regression with PySpark -- Putting It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 8: Decision Tree Classification with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Decision Tree Classification -- Scikit-Learn and PySpark Similarities -- Decision Tree Classification with Scikit-Learn -- Decision Tree Classification with PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 9: Random Forest Classification with Scikit- Learn and PySpark -- Random Forest Classification
Scikit-Learn and PySpark Similarities for Random Forests -- Random Forests with Scikit-Learn -- Random Forests with PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 10: Support Vector Machine Classification with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Support Vector Machine Classification -- Linear SVM with Scikit-Learn -- Linear SVM with PySpark -- Bringing It All Together -- Scikit-Learn -- PySpark -- Summary -- Chapter 11: Naive Bayes Classification with Pandas, Scikit-Learn, and PySpark -- The Dataset -- Naive Bayes Classification
Summary Migrate from pandas and scikit-learn to PySpark to handle vast amounts of data and achieve faster data processing time. This book will show you how to make this transition by adapting your skills and leveraging the similarities in syntax, functionality, and interoperability between these tools. Distributed Machine Learning with PySpark offers a roadmap to data scientists considering transitioning from small data libraries (pandas/scikit-learn) to big data processing and machine learning with PySpark. You will learn to translate Python code from pandas/scikit-learn to PySpark to preprocess large volumes of data and build, train, test, and evaluate popular machine learning algorithms such as linear and logistic regression, decision trees, random forests, support vector machines, Nave Bayes, and neural networks. After completing this book, you will understand the foundational concepts of data preparation and machine learning and will have the skills necessary to apply these methods using PySpark, the industry standard for building scalable ML data pipelines. You will: Master the fundamentals of supervised learning, unsupervised learning, NLP, and recommender systems Understand the differences between PySpark, scikit-learn, and pandas Perform linear regression, logistic regression, and decision tree regression with pandas, scikit-learn, and PySpark Distinguish between the pipelines of PySpark and scikit-learn
Bibliography Includes bibliographical references and index
Notes Naive Bayes with Scikit-Learn
Description based on online resource; title from digital title page (viewed on December 12, 2023)
Subject Machine learning.
Machine learning.
Genre/Form Electronic books
Form Electronic book
ISBN 9781484297513
1484297512