Save to My Lists Export Return to Browse Limit/Sort Search

Record 9 of 89

Previous Record Next Record

Book Cover

E-book

Author

Title Big Data Analytics with Hadoop 3 : Build highly effective analytics solutions to gain valuable insight into your big data

Published Birmingham : Packt Publishing, 2018

Click on the following:

ProQuest Ebook Central

Copies

Description 1 online resource (471 pages)

Contents Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Introduction to Hadoop; Hadoop Distributed File System; High availability; Intra-DataNode balancer; Erasure coding; Port numbers; MapReduce framework; Task-level native optimization; YARN; Opportunistic containers; Types of container execution ; YARN timeline service v. 2; Enhancing scalability and reliability; Usability improvements; Architecture; Other changes; Minimum required Java version ; Shell script rewrite; Shaded-client JARs; Installing Hadoop 3 ; Prerequisites; Downloading

InstallationSetup password-less ssh; Setting up the NameNode; Starting HDFS; Setting up the YARN service; Erasure Coding; Intra-DataNode balancer; Installing YARN timeline service v. 2; Setting up the HBase cluster; Simple deployment for HBase; Enabling the co-processor; Enabling timeline service v. 2; Running timeline service v. 2; Enabling MapReduce to write to timeline service v. 2; Summary; Chapter 2: Overview of Big Data Analytics; Introduction to data analytics; Inside the data analytics process; Introduction to big data; Variety of data; Velocity of data; Volume of data; Veracity of data

Variability of dataVisualization; Value; Distributed computing using Apache Hadoop; The MapReduce framework; Hive; Downloading and extracting the Hive binaries; Installing Derby; Using Hive; Creating a database; Creating a table; SELECT statement syntax; WHERE clauses; INSERT statement syntax; Primitive types; Complex types; Built-in operators and functions; Built-in operators; Built-in functions; Language capabilities; A cheat sheet on retrieving information ; Apache Spark; Visualization using Tableau; Summary; Chapter 3: Big Data Processing with MapReduce; The MapReduce framework; Dataset

Record readerMap; Combiner; Partitioner; Shuffle and sort; Reduce; Output format; MapReduce job types; Single mapper job; Single mapper reducer job; Multiple mappers reducer job; SingleMapperCombinerReducer job; Scenario; MapReduce patterns; Aggregation patterns; Average temperature by city; Record count; Min/max/count; Average/median/standard deviation; Filtering patterns; Join patterns; Inner join; Left anti join; Left outer join; Right outer join; Full outer join; Left semi join; Cross join; Summary; Chapter 4: Scientific Computing and Big Data Analysis with Python and Hadoop; Installation

Installing standard PythonInstalling Anaconda; Using Conda; Data analysis; Summary; Chapter 5: Statistical Big Data Computing with R and Hadoop; Introduction; Install R on workstations and connect to the data in Hadoop; Install R on a shared server and connect to Hadoop; Utilize Revolution R Open; Execute R inside of MapReduce using RMR2; Summary and outlook for pure open source options; Methods of integrating R and Hadoop; RHADOOP -- install R on workstations and connect to data in Hadoop; RHIPE -- execute R inside Hadoop MapReduce; R and Hadoop Streaming

RHIVE -- install R on workstations and connect to data in Hadoop

Summary Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book

Notes Print version record

Subject Big data.

Cluster analysis.

Electronic data processing -- Distributed processing.

Database design & theory.

Data warehousing.

Information architecture.

Data capture & analysis.

Computers -- Database Management -- Data Warehousing.

Computers -- Data Modeling & Design.

Computers -- Data Processing.

Big data.

Cluster analysis -- Data processing.

Electronic data processing -- Distributed processing.

Form Electronic book

ISBN 9781788624954

1788624955

1788628845

9781788628846

Permalink