Limit search to available items
Book Cover
Author Turkington, Garry, author

Title Learning Hadoop 2 / Turkington, Garry
Edition First edition
Published Packt Publishing, 2015
Online access available from:
Safari O'Reilly books online    View Resource Record  
EBSCO eBook Academic Collection    View Resource Record  


Description 1 online resource (382 pages)
Summary Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2 In Detail This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop 2. Starting with the core components of the framework-HDFS and YARN-this book will guide you through how to build applications using a variety of approaches. You will learn how YARN completely changes the relationship between MapReduce and Hadoop and allows the latter to support more varied processing approaches and a broader array of applications. These include real-time processing with Apache Samza and iterative computation with Apache Spark. Next up, we discuss Apache Pig and the dataflow data model it provides. You will discover how to use Pig to analyze a Twitter dataset. With this book, you will be able to make your life easier by using tools such as Apache Hive, Apache Oozie, Hadoop Streaming, Apache Crunch, and Kite SDK. The last part of this book discusses the likely future direction of major Hadoop components and how to get involved with the Hadoop community. What You Will Learn Write distributed applications using the MapReduce framework Go beyond MapReduce and process data in real time with Samza and iteratively with Spark Familiarize yourself with data mining approaches that work with very large datasets Prototype applications on a VM and deploy them to a local cluster or to a cloud infrastructure (Amazon Web Services) Conduct batch and real time data analysis using SQL-like tools Build data processing flows using Apache Pig and see how it enables the easy incorporation of custom functionality Define and orchestrate complex workflows and pipelines with Apache Oozie Manage your data lifecycle and changes over time Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you
Notes Mode of access: World Wide Web
Copyright © 2015 Packt Publishing
Issuing Body Made available through: Safari, an O'Reilly Media Company
Subject Apache Hadoop.
Electronic data processing -- Distributed processing.
Cloud computing.
Big data.
Form Electronic book
Author Modena, Gabriele, author
Safari, an O'Reilly Media Company