Limit search to available items
Book Cover
E-book
Author Banerjee, Sinchan

Title Scalable Data Architecture with Java : Build Efficient Enterprise-Grade Data Architecting Solutions Using Java / Sinchan Banerjee
Published Birmingham : Packt Publishing, Limited, 2022

Copies

Description 1 online resource (382 p.)
Contents Cover -- Title Page -- Copyright and Credits -- Contributors -- About the reviewers -- Table of Contents -- Preface -- Section 1 -- Foundation of Data Systems -- Chapter 1: Basics of Modern Data Architecture -- Exploring the landscape of data engineering -- What is data engineering? -- Dimensions of data -- Types of data engineering problems -- Responsibilities and challenges of a Java data architect -- Data architect versus data engineer -- Challenges of a data architect -- Techniques to mitigate those challenges -- Summary -- Chapter 2: Data Storage and Databases
Understanding data types, formats, and encodings -- Data types -- Data formats -- Understanding file, block, and object storage -- File storage -- Block storage -- Object storage -- The data lake, data warehouse, and data mart -- Data lake -- Data warehouse -- Data marts -- Databases and their types -- Relational database -- NoSQL database -- Data model design considerations -- Summary -- Chapter 3: Identifying the Right Data Platform -- Technical requirements -- Virtualization and containerization platforms -- Benefits of virtualization -- Containerization -- Benefits of containerization
Kubernetes -- Hadoop platforms -- Hadoop architecture -- Cloud platforms -- Benefits of cloud computing -- Choosing the correct platform -- When to choose virtualization versus containerization -- When to use big data -- Choosing between on-premise versus cloud-based solutions -- Choosing between various cloud vendors -- Summary -- Section 2 -- Building Data Processing Pipelines -- Chapter 4: ETL Data Load -- A Batch-Based Solution to Ingesting Data in a Data Warehouse -- Technical requirements -- Understanding the problem and source data -- Problem statement -- Understanding the source data
Building an effective data model -- Relational data warehouse schemas -- Evaluation of the schema design -- Designing the solution -- Implementing and unit testing the solution -- Summary -- Chapter 5: Architecting a Batch Processing Pipeline -- Technical requirements -- Developing the architecture and choosing the right tools -- Problem statement -- Analyzing the problem -- Architecting the solution -- Factors that affect your choice of storage -- Determining storage based on cost -- The cost factor in the processing layer -- Implementing the solution -- Profiling the source data
Writing the Spark application -- Deploying and running the Spark application -- Developing and testing a Lambda trigger -- Performance tuning a Spark job -- Querying the ODL using AWS Athena -- Summary -- Chapter 6: Architecting a Real-Time Processing Pipeline -- Technical requirements -- Understanding and analyzing the streaming problem -- Problem statement -- Analyzing the problem -- Architecting the solution -- Implementing and verifying the design -- Setting up Apache Kafka on your local machine -- Developing the Kafka streaming application -- Unit testing a Kafka Streams application
Summary Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clients Key Features Learn how to adapt to the ever-evolving data architecture technology landscape Understand how to choose the best suited technology, platform, and architecture to realize effective business value Implement effective data security and governance principles Book Description Java architectural patterns and tools help architects to build reliable, scalable, and secure data engineering solutions that collect, manipulate, and publish data. This book will help you make the most of the architecting data solutions available with clear and actionable advice from an expert. You'll start with an overview of data architecture, exploring responsibilities of a Java data architect, and learning about various data formats, data storage, databases, and data application platforms as well as how to choose them. Next, you'll understand how to architect a batch and real-time data processing pipeline. You'll also get to grips with the various Java data processing patterns, before progressing to data security and governance. The later chapters will show you how to publish Data as a Service and how you can architect it. Finally, you'll focus on how to evaluate and recommend an architecture by developing performance benchmarks, estimations, and various decision metrics. By the end of this book, you'll be able to successfully orchestrate data architecture solutions using Java and related technologies as well as to evaluate and present the most suitable solution to your clients. What you will learn Analyze and use the best data architecture patterns for problems Understand when and how to choose Java tools for a data architecture Build batch and real-time data engineering solutions using Java Discover how to apply security and governance to a solution Measure performance, publish benchmarks, and optimize solutions Evaluate, choose, and present the best architectural alternatives Understand how to publish Data as a Service using GraphQL and a REST API Who this book is for Data architects, aspiring data architects, Java developers and anyone who wants to develop or optimize scalable data architecture solutions using Java will find this book useful. A basic understanding of data architecture and Java programming is required to get the best from this book
Notes Configuring and running the application
Subject Software architecture.
Java (Computer program language)
Java (Computer program language)
Software architecture.
Form Electronic book
ISBN 9781801072083
1801072086