Limit search to available items
Book Cover
E-book
Author Kimball, Ralph, author

Title The data warehouse toolkit : the definitive guide to dimensional modeling / Ralph Kimball, Margy Ross
Edition Third edition
Published Indianapolis, IN : John Wiley & Sons, Inc., [2013]
©2013
Online access available from:
Safari O'Reilly books online    View Resource Record  

Copies

Description 1 online resource (xxxiv, 564 pages) : illustrations
Contents 1. Data warehouse, business intelligence and dimensional modeling primer -- 2. Kimball dimensional modeling techniques overview -- 3. Retail sales -- 4. Inventory -- 5. Procurement -- 6. Order management -- 7. Accounting -- 8. Customer relationship management -- 9. Human resources management -- 10. Financial services -- 11. Telecommunications -- 12. Transportation -- 13. Education -- 14. Healthcare -- 15. Electronic commerce -- 16. Insurance -- 17. Kimball DW/BI lifecycle overview -- 18. Dimensional modeling process and tasks -- 19. ETL subsystem and techniques -- 20. ETL system design and development process and tasks -- 21. Big data analytics
15. Electronic Commerce -- Clickstream Source Data -- Clickstream Data Challenges -- Clickstream Dimensional Models -- Page Dimension -- Event Dimension -- Session Dimension -- Referral Dimension -- Clickstream Session Fact Table -- Clickstream Page Event Fact Table -- Step Dimension -- Aggregate Clickstream Fact Tables -- Google Analytics -- Integrating Clickstream into Web Retailer's Bus Matrix -- Profitability Across Channels Including Web -- Summary -- 16. Insurance -- Insurance Case Study -- Insurance Value Chain -- Draft Bus Matrix -- Policy Transactions -- Dimension Role Playing -- Slowly Changing Dimensions -- Mini-Dimensions for Large or Rapidly Changing Dimensions -- Multivalued Dimension Attributes -- Numeric Attributes as Facts or Dimensions -- Degenerate Dimension -- Low Cardinality Dimension Tables -- Audit Dimension -- Policy Transaction Fact Table -- Heterogeneous Supertype and Subtype Products -- Complementary Policy Accumulating Snapshot -- Premium Periodic Snapshot -- Conformed Dimensions -- Conformed Facts -- Pay-in-Advance Facts -- Heterogeneous Supertypes and Subtypes Revisited -- Multivalued Dimensions Revisited -- More Insurance Case Study Background -- Updated Insurance Bus Matrix -- Detailed Implementation Bus Matrix -- Claim Transactions -- Transaction Versus Profile Junk Dimensions -- Claim Accumulating Snapshot -- Accumulating Snapshot for Complex Workflows -- Timespan Accumulating Snapshot -- Periodic Instead of Accumulating Snapshot -- Policy/Claim Consolidated Periodic Snapshot -- Factless Accident Events -- Common Dimensional Modeling Mistakes to Avoid -- Mistake 10: Place Text Attributes in a Fact Table -- Mistake 9: Limit Verbose Descriptors to Save Space -- Mistake 8: Split Hierarchies into Multiple Dimensions -- Mistake 7: Ignore the Need to Track Dimension Changes -- Mistake 6: Solve All Performance Problems with More Hardware -- Mistake 5: Use Operational Keys to Join Dimensions and Facts -- Mistake 4: Neglect to Declare and Comply with the Fact Grain -- Mistake 3: Use a Report to Design the Dimensional Model -- Mistake 2: Expect Users to Query Normalized Atomic Data -- Mistake 1: Fail to Conform Facts and Dimensions -- Summary -- 17. Kimball DW/BI Lifecycle Overview -- Lifecycle Roadmap -- Roadmap Mile Markers -- Lifecycle Launch Activities -- Program/Project Planning and Management -- Business Requirements Definition -- Lifecycle Technology Track -- Technical Architecture Design -- Product Selection and Installation -- Lifecycle Data Track -- Dimensional Modeling -- Physical Design -- ETL Design and Development -- Lifecycle BI Applications Track -- BI Application Specification -- BI Application Development -- Lifecycle Wrap-up Activities -- Deployment -- Maintenance and Growth -- Common Pitfalls to Avoid -- Summary -- 18. Dimensional Modeling Process and Tasks -- Modeling Process Overview -- Get Organized -- Identify Participants, Especially Business Representatives -- Review the Business Requirements -- Leverage a Modeling Tool -- Leverage a Data Profiling Tool -- Leverage or Establish Naming Conventions -- Coordinate Calendars and Facilities -- Design the Dimensional Model -- Reach Consensus on High-Level Bubble Chart -- Develop the Detailed Dimensional Model -- Review and Validate the Model -- Finalize the Design Documentation -- Summary -- 19. ETL Subsystems and Techniques -- Round Up the Requirements -- Business Needs -- Compliance -- Data Quality -- Security -- Data Integration -- Data Latency -- Archiving and Lineage -- BI Delivery Interfaces -- Available Skills -- Legacy Licenses -- The 34 Subsystems of ETL -- Extracting: Getting Data into the Data Warehouse -- Subsystem 1: Data Profiling -- Subsystem 2: Change Data Capture System -- Subsystem 3: Extract System -- Cleaning and Conforming Data -- Improving Data Quality Culture and Processes -- Subsystem 4: Data Cleansing System -- Subsystem 5: Error Event Schema -- Subsystem 6: Audit Dimension Assembler -- Subsystem 7: Deduplication System -- Subsystem 8: Conforming System -- Delivering: Prepare for Presentation -- Subsystem 9: Slowly Changing Dimension Manager -- Subsystem 10: Surrogate Key Generator -- Subsystem 11: Hierarchy Manager -- Subsystem 12: Special Dimensions Manager -- Subsystem 13: Fact Table Builders -- Subsystem 14: Surrogate Key Pipeline -- Subsystem 15: Multivalued Dimension Bridge Table Builder -- Subsystem 16: Late Arriving Data Handler -- Subsystem 17: Dimension Manager System -- Subsystem 18: Fact Provider System -- Subsystem 19: Aggregate Builder -- Subsystem 20: OLAP Cube Builder -- Subsystem 21: Data Propagation Manager -- Managing the ETL Environment -- Subsystem 22: Job Scheduler -- Subsystem 23: Backup System -- Subsystem 24: Recovery and Restart System -- Subsystem 25: Version Control System -- Subsystem 26: Version Migration System -- Subsystem 27: Workflow Monitor -- Subsystem 28: Sorting System -- Subsystem 29: Lineage and Dependency Analyzer -- Subsystem 30: Problem Escalation System -- Subsystem 31: Parallelizing/Pipelining System -- Subsystem 32: Security System -- Subsystem 33: Compliance Manager -- Subsystem 34: Metadata Repository Manager -- Summary -- 20. ETL System Design and Development Process and Tasks -- ETL Process Overview -- Develop the ETL Plan -- Step 1: Draw the High-Level Plan -- Step 2: Choose an ETL Tool -- Step 3: Develop Default Strategies -- Step 4: Drill Down by Target Table -- Develop the ETL Specification Document -- Develop One-Time Historic Load Processing -- Step 5: Populate Dimension Tables with Historic Data -- Step 6: Perform the Fact Table Historic Load -- Develop Incremental ETL Processing -- Step 7: Dimension Table Incremental Processing -- Step 8: Fact Table Incremental Processing -- Step 9: Aggregate Table and OLAP Loads -- Step 10: ETL System Operation and Automation -- Real-Time Implications -- Real-Time Triage -- Real-Time Architecture Trade-Offs -- Real-Time Partitions in the Presentation Server -- Summary -- 21. Big Data Analytics -- Big Data Overview -- Extended RDBMS Architecture -- MapReduce/Hadoop Architecture -- Comparison of Big Data Architectures -- Recommended Best Practices for Big Data -- Management Best Practices for Big Data -- Architecture Best Practices for Big Data -- Data Modeling Best Practices for Big Data -- Data Governance Best Practices for Big Data -- Summary
5. Procurement -- Procurement Case Study -- Procurement Transactions and Bus Matrix -- Single Versus Multiple Transaction Fact Tables -- Complementary Procurement Snapshot -- Slowly Changing Dimension Basics -- Type 0: Retain Original -- Type 1: Overwrite -- Type 2: Add New Row -- Type 3: Add New Attribute -- Type 4: Add Mini-Dimension -- Hybrid Slowly Changing Dimension Techniques -- Type 5: Mini-Dimension and Type 1 Outrigger -- Type 6: Add Type 1 Attributes to Type 2 Dimension -- Type 7: Dual Type 1 and Type 2 Dimensions -- Slowly Changing Dimension Recap -- Summary -- 6. Order Management -- Order Management Bus Matrix -- Order Transactions -- Fact Normalization -- Dimension Role Playing -- Product Dimension Revisited -- Customer Dimension -- Deal Dimension -- Degenerate Dimension for Order Number -- Junk Dimensions -- Header/Line Pattern to Avoid -- Multiple Currencies -- Transaction Facts at Different Granularity -- Another Header/Line Pattern to Avoid -- Invoice Transactions -- Service Level Performance as Facts, Dimensions, or Both -- Profit and Loss Facts -- Audit Dimension -- Accumulating Snapshot for Order Fulfillment Pipeline -- Lag Calculations -- Multiple Units of Measure -- Beyond the Rearview Mirror -- Summary -- 7. Accounting -- Accounting Case Study and Bus Matrix -- General Ledger Data -- General Ledger Periodic Snapshot -- Chart of Accounts -- Period Close -- Year-to-Date Facts -- Multiple Currencies Revisited -- General Ledger Journal Transactions -- Multiple Fiscal Accounting Calendars -- Drilling Down Through a Multilevel Hierarchy -- Financial Statements -- Budgeting Process -- Dimension Attribute Hierarchies -- Fixed Depth Positional Hierarchies -- Slightly Ragged Variable Depth Hierarchies -- Ragged Variable Depth Hierarchies -- Shared Ownership in a Ragged Hierarchy -- Time Varying Ragged Hierarchies -- Modifying Ragged Hierarchies -- Alternative Ragged Hierarchy Modeling Approaches -- Advantages of the Bridge Table Approach for Ragged Hierarchies -- Consolidated Fact Tables -- Role of OLAP and Packaged Analytic Solutions -- Summary -- 8. Customer Relationship Management -- CRM Overview -- Operational and Analytic CRM -- Customer Dimension Attributes -- Name and Address Parsing -- International Name and Address Considerations -- Customer-Centric Dates -- Aggregated Facts as Dimension Attributes -- Segmentation Attributes and Scores -- Counts with Type 2 Dimension Changes -- Outrigger for Low Cardinality Attribute Set -- Customer Hierarchy Considerations -- Bridge Tables for Multivalued Dimensions -- Bridge Table for Sparse Attributes -- Bridge Table for Multiple Customer Contacts -- Complex Customer Behavior -- Behavior Study Groups for Cohorts -- Step Dimension for Sequential Behavior -- Timespan Fact Tables -- Tagging Fact Tables with Satisfaction Indicators -- Tagging Fact Tables with Abnormal Scenario Indicators -- Customer Data Integration Approaches -- Master Data Management Creating a Single Customer Dimension -- Partial Conformity of Multiple Customer Dimensions -- Avoiding Fact-to-Fact Table Joins -- Low Latency Reality Check -- Summary -- 9. Human Resources Management -- Employee Profi le Tracking -- Precise Effective and Expiration Timespans -- Dimension Change Reason Tracking -- Profi le Changes as Type 2 Attributes or Fact Events -- Headcount Periodic Snapshot -- Bus Matrix for HR Processes -- Packaged Analytic Solutions and Data Models -- Recursive Employee Hierarchies -- Change Tracking on Embedded Manager Key -- Drilling Up and Down Management Hierarchies -- Multivalued Skill Keyword Attributes -- Skill Keyword Bridge -- Skill Keyword Text String -- Survey Questionnaire Data -- Text Comments -- Summary -- 10. Financial Service -- Banking Case Study and Bus Matrix -- Dimension Triage to Avoid Too Few Dimensions -- Household Dimension -- Multivalued Dimensions and Weighting Factors -- Mini-Dimensions Revisited -- Adding a Mini-Dimension to a Bridge Table -- Dynamic Value Banding of Facts -- Supertype and Subtype Schemas for Heterogeneous Products -- Supertype and Subtype Products with Common Facts -- Hot Swappable Dimensions -- Summary -- 11. Telecommunications -- Telecommunications Case Study and Bus Matrix -- General Design Review Considerations -- Balance Business Requirements and Source Realities -- Focus on Business Processes -- Granularity -- Single Granularity for Facts -- Dimension Granularity and Hierarchies -- Date Dimension -- Degenerate Dimensions -- Surrogate Keys -- Dimension Decodes and Descriptions -- Conformity Commitment -- Design Review Guidelines -- Draft Design Exercise Discussion -- Remodeling Existing Data Structures -- Geographic Location Dimension -- Summary -- 12. Transportation -- Airline Case Study and Bus Matrix -- Multiple Fact Table Granularities -- Linking Segments into Trips -- Related Fact Tables -- Extensions to Other Industries -- Cargo Shipper -- Travel Services -- Combining Correlated Dimensions -- Class of Service -- Origin and Destination -- More Date and Time Considerations -- Country-Specific Calendars as Outriggers -- Date and Time in Multiple Time Zones -- Localization Recap -- Summary -- 13. Education -- University Case Study and Bus Matrix -- Accumulating Snapshot Fact Tables -- Applicant Pipeline -- Research Grant Proposal Pipeline -- Factless Fact Tables -- Admissions Events -- Course Registrations -- Facility Utilization -- Student Attendance -- More Educational Analytic Opportunities -- Summary -- 14. Healthcare -- Healthcare Case Study and Bus Matrix -- Claims Billing and Payments -- Date Dimension Role Playing -- Multivalued Diagnoses -- Supertypes and Subtypes for Charges -- Electronic Medical Records -- Measure Type Dimension for Sparse Facts -- Freeform Text Comments -- Images -- Facility/Equipment Inventory Utilization -- Dealing with Retroactive Changes -- Summary
Introduction -- 1. Data Warehousing, Business Intelligence, and Dimensional Modeling Primer -- Different Worlds of Data Capture and Data Analysis -- Goals of Data Warehousing and Business Intelligence -- Publishing Metaphor for DW/BI Managers -- Dimensional Modeling Introduction -- Star Schemas Versus OLAP Cubes -- Fact Tables for Measurements -- Dimension Tables for Descriptive Context -- Facts and Dimensions Joined in a Star Schema -- Kimball's DW/BI Architecture -- Operational Source Systems -- Extract, Transformation, and Load System -- Presentation Area to Support Business Intelligence -- Business Intelligence Applications -- Restaurant Metaphor for the Kimball Architecture -- Alternative DW/BI Architectures -- Independent Data Mart Architecture -- Hub-and-Spoke Corporate Information Factory Inmon Architecture -- Hybrid Hub-and-Spoke and Kimball Architecture -- Dimensional Modeling Myths -- Myth 1: Dimensional Models are Only for Summary Data -- Myth 2: Dimensional Models are Departmental, Not Enterprise -- Myth 3: Dimensional Models are Not Scalable -- Myth 4: Dimensional Models are Only for Predictable Usage -- Myth 5: Dimensional Models Can't Be Integrated -- More Reasons to Think Dimensionally -- Agile Considerations -- Summary -- 2. Kimball Dimensional Modeling Techniques Overview -- Fundamental Concepts -- Gather Business Requirements and Data Realities -- Collaborative Dimensional Modeling Workshops -- Four-Step Dimensional Design Process -- Business Processes -- Grain -- Dimensions for Descriptive Context -- Facts for Measurements -- Star Schemas and OLAP Cubes -- Graceful Extensions to Dimensional Models -- Basic Fact Table Techniques -- Fact Table Structure -- Additive, Semi-Additive, Non-Additive Facts -- Nulls in Fact Tables -- Conformed Facts -- Transaction Fact Tables -- Periodic Snapshot Fact Tables -- Accumulating Snapshot Fact Tables -- Factless Fact Tables -- Aggregate Fact Tables or OLAP Cubes -- Consolidated Fact Tables -- Basic Dimension Table Techniques -- Dimension Table Structure -- Dimension Surrogate Keys -- Natural, Durable, and Supernatural Keys -- Drilling Down -- Degenerate Dimensions -- Denormalized Flattened Dimensions -- Multiple Hierarchies in Dimensions -- Flags and Indicators as Textual Attributes -- Null Attributes in Dimensions -- Calendar Date Dimensions -- Role-Playing Dimensions -- Junk Dimensions -- Snowflaked Dimensions -- Outrigger Dimensions -- Integration via Conformed Dimensions -- Conformed Dimensions -- Shrunken Dimensions -- Drilling Across -- Value Chain -- Enterprise Data Warehouse Bus Architecture -- Enterprise Data Warehouse Bus Matrix -- Detailed Implementation Bus Matrix -- Opportunity/Stakeholder Matrix -- Dealing with Slowly Changing Dimension Attributes -- Type 0: Retain Original -- Type 1: Overwrite -- Type 2: Add New Row -- Type 3: Add New Attribute -- Type 4: Add Mini-Dimension -- Type 5: Add Mini-Dimension and Type 1 Outrigger -- Type 6: Add Type 1 Attributes to Type 2 Dimension -- Type 7: Dual Type 1 and Type 2 Dimensions -- Dealing with Dimension Hierarchies -- Fixed Depth Positional Hierarchies -- Slightly Ragged/Variable Depth Hierarchies -- Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables -- Ragged/Variable Depth Hierarchies with Pathstring Attributes -- Advanced Fact Table Techniques -- Fact Table Surrogate Keys -- Centipede Fact Tables -- Numeric Values as Attributes or Facts -- Lag/Duration Facts -- Header/Line Fact Tables -- Allocated Facts -- Profit and Loss Fact Tables Using Allocations -- Multiple Currency Facts -- Multiple Units of Measure Facts -- Year-to-Date Facts -- Multipass SQL to Avoid Fact-to-Fact Table Joins -- Timespan Tracking in Fact Tables -- Late Arriving Facts -- Advanced Dimension Techniques -- Dimension-to-Dimension Table Joins -- Multivalued Dimensions and Bridge Tables -- Time Varying Multivalued Bridge Tables -- Behavior Tag Time Series -- Behavior Study Groups -- Aggregated Facts as Dimension Attributes -- Dynamic Value Bands -- Text Comments Dimension -- Multiple Time Zones -- Measure Type Dimensions -- Step Dimensions -- Hot Swappable Dimensions -- Abstract Generic Dimensions -- Audit Dimensions -- Late Arriving Dimensions -- Special Purpose Schemas -- Supertype and Subtype Schemas for Heterogeneous Products -- Real-Time Fact Tables -- Error Event Schemas -- 3. Retail Sales -- Four-Step Dimensional Design Process -- Step 1: Select the Business Process -- Step 2: Declare the Grain -- Step 3: Identify the Dimensions -- Step 4: Identify the Facts -- Retail Case Study -- Step 1: Select the Business Process -- Step 2: Declare the Grain -- Step 3: Identify the Dimensions -- Step 4: Identify the Facts -- Dimension Table Details -- Date Dimension -- Product Dimension -- Store Dimension -- Promotion Dimension -- Other Retail Sales Dimensions -- Degenerate Dimensions for Transaction Numbers -- Retail Schema in Action -- Retail Schema Extensibility -- Factless Fact Tables -- Dimension and Fact Table Keys -- Dimension Table Surrogate Keys -- Dimension Natural and Durable Supernatural Keys -- Degenerate Dimension Surrogate Keys -- Date Dimension Smart Keys -- Fact Table Surrogate Keys -- Resisting Normalization Urges -- Snowflake Schemas with Normalized Dimensions -- Outriggers -- Centipede Fact Tables with Too Many Dimensions -- Summary -- 4. Inventory -- Value Chain Introduction -- Inventory Models -- Inventory Periodic Snapshot -- Inventory Transactions -- Inventory Accumulating Snapshot -- Fact Table Types -- Transaction Fact Tables -- Periodic Snapshot Fact Tables -- Accumulating Snapshot Fact Tables -- Complementary Fact Table Types -- Value Chain Integration -- Enterprise Data Warehouse Bus Architecture -- Understanding the Bus Architecture -- Enterprise Data Warehouse Bus Matrix -- Conformed Dimensions -- Drilling Across Fact Tables -- Identical Conformed Dimensions -- Shrunken Rollup Conformed Dimension with Attribute Subset -- Shrunken Conformed Dimension with Row Subset -- Shrunken Conformed Dimensions on the Bus Matrix -- Limited Conformity -- Importance of Data Governance and Stewardship -- Conformed Dimensions and the Agile Movement -- Conformed Facts -- Summary
1. Data Warehousing, Business Intelligence, and Dimensional Modeling Primer -- 2. Kimball Dimensional Modeling Techniques Overview -- 3. Retail Sales -- 4. Inventory -- 5. Procurement -- 6. Order Management -- 7. Accounting -- 8. Customer Relationship Management -- 9. Human Resources Management -- 10. Financial Service -- 11. Telecommunications -- 12. Transportation -- 13. Education -- 14. Healthcare -- 15. Electronic Commerce -- 16. Insurance -- 17. Kimball DW/BI Lifecycle Overview -- 18. Dimensional Modeling Process and Tasks -- 19. ETL Subsystems and Techniques -- 20. ETL System Design and Development Process and Tasks -- 21. Big Data Analytics
Summary Annotation: Updated new edition of Ralph Kimballs groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimballs The Data Warehouse Toolkit introduced the industry to dimensional modeling, and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more. Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence Begins with fundamental design recommendations and progresses through increasingly complex scenarios Presents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting, customer relationship management, big data analytics, and more Draws realworld case studies from a variety of industries, including retail sales, financial services, telecommunications, education, health care, insurance, ecommerce, and more Design dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition
Notes Online resource; title from title page (ebrary, viewed July 19, 2013)
Subject Business enterprises -- Data processing.
Business intelligence.
Data warehousing.
Database design.
Form Electronic book
Author Ross, Margy, 1959- author
ISBN 1118530772 (e-book)
1118530802
1118732197 (e-book)
1118732286 (e-book)
9781118530771 (e-book)
9781118530801
9781118732199 (e-book)
9781118732281 (e-book)