Limit search to available items
Record 19 of 114
Previous Record Next Record
Book Cover
E-book
Author Mehrotra, Shrey

Title Apache Spark Quick Start Guide : Quickly Learn the Art of Writing Efficient Big Data Applications with Apache Spark
Published Birmingham : Packt Publishing Ltd, 2019

Copies

Description 1 online resource (150 pages)
Contents Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Introduction to Apache Spark; What is Spark?; Spark architecture overview; Spark language APIs; Scala; Java; Python; R; SQL; Spark components; Spark Core; Spark SQL; Spark Streaming; Spark machine learning; Spark graph processing; Cluster manager; Standalone scheduler; YARN; Mesos; Kubernetes; Making the most of Hadoop and Spark; Summary; Chapter 2: Apache Spark Installation; AWS elastic compute cloud (EC2); Creating a free account on AWS; Connecting to your Linux instance
Configuring SparkPrerequisites; Installing Java; Installing Scala; Installing Python; Installing Spark; Using Spark components; Different modes of execution; Spark sandbox; Summary; Chapter 3: Spark RDD; What is an RDD?; Resilient metadata; Programming using RDDs; Transformations and actions; Transformation; Narrow transformations; map(); flatMap(); filter(); union(); mapPartitions(); Wide transformations; distinct(); sortBy(); intersection(); subtract(); cartesian(); Action; collect(); count(); take(); top(); takeOrdered(); first(); countByValue(); reduce(); saveAsTextFile(); foreach()
Types of RDDsPair RDDs; groupByKey(); reduceByKey(); sortByKey(); join(); Caching and checkpointing; Caching; Checkpointing ; Understanding partitions ; repartition() versus coalesce(); partitionBy(); Drawbacks of using RDDs; Summary; Chapter 4: Spark DataFrame and Dataset; DataFrames; Creating DataFrames; Data sources; DataFrame operations and associated functions; Running SQL on DataFrames; Temporary views on DataFrames; Global temporary views on DataFrames; Datasets; Encoders; Internal row; Creating custom encoders; Summary; Chapter 5: Spark Architecture and Application Execution Flow
A sample applicationDAG constructor; Stage; Tasks; Task scheduler; FIFO; FAIR; Application execution modes; Local mode; Client mode; Cluster mode; Application monitoring; Spark UI; Application logs; External monitoring solution; Summary; Chapter 6: Spark SQL; Spark SQL; Spark metastore; Using the Hive metastore in Spark SQL; Hive configuration with Spark; SQL language manual; Database; Table and view; Load data; Creating UDFs; SQL database using JDBC; Summary; Chapter 7: Spark Streaming, Machine Learning, and Graph Analysis; Spark Streaming; Use cases; Data sources; Stream processing
MicrobatchDStreams; Streaming architecture; Streaming example; Machine learning; MLlib; ML; Graph processing; GraphX; mapVertices; mapEdges; subgraph; GraphFrames; degrees; subgraphs; Graph algorithms; PageRank; Summary; Chapter 8: Spark Optimizations; Cluster-level optimizations; Memory; Disk; CPU cores; Project Tungsten; Application optimizations; Language choice; Structured versus unstructured APIs; File format choice; RDD optimizations; Choosing the right transformations; Serializing and compressing ; Broadcast variables; DataFrame and dataset optimizations; Catalyst optimizer; Storage
Summary Apache Spark is a flexible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases
Notes Parallelism
Print version record
SUBJECT Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445
Spark (Electronic resource : Apache Software Foundation) fast
Subject Machine learning.
COMPUTERS -- Programming -- Open Source.
COMPUTERS -- Software Development & Engineering -- Tools.
COMPUTERS -- Software Development & Engineering -- General.
Machine learning
Form Electronic book
Author Grade, Akash
ISBN 178934266X
9781789342666