Databricks Certified Data Engineer Professional - Comprehensive Resource Guide

A curated collection of demos, blog posts, official documentation, and training resources mapped to each exam objective for the Databricks Certified Data Engineer Professional certification (November 2025 version). This supercedes my previous blog post series for the last exam syllabus.

How to Use This Guide

For each exam section and objective, this guide provides:

📚 Official Documentation: Direct links to official Databricks docs (docs.databricks.com)
🎯 Demos: Interactive demonstrations and tutorials
✍️ Blog Posts: Technical articles and best practices
🎓 Training Resources: Courses, certifications, and learning materials

Resources are ranked by relevance score based on keyword matching. Review multiple resources for each objective to get comprehensive coverage.

About the Author

I’m a Databricks Solutions Architect Champion with extensive experience in data engineering and lakehouse architecture. This guide is designed to help you navigate the Data Engineer Professional certification, which is one of the most challenging Databricks certifications.

The Data Engineer Professional exam tests your ability to build production-grade, enterprise-scale data engineering solutions. This goes beyond basic pipeline development – you need to demonstrate expertise in optimization, security, governance, monitoring, and deployment automation.

I created this guide by analyzing the exam objectives and mapping them to the best available resources. My advice: this exam requires hands-on experience. You should have built real production pipelines, optimized queries, implemented security controls, and deployed with DABs before attempting this exam.

Find out what works best for you. Good luck on your Databricks certification journey!

About the Exam

Exam Name: Databricks Certified Data Engineer Professional
Exam Date: November 30, 2025
Questions: 59 scored multiple-choice
Time Limit: 120 minutes
Registration Fee: USD $200
Validity: 2 years
Prerequisite: Data Engineer Associate recommended; 2+ years hands-on experience strongly recommended

Recommended Preparation

Instructor-led: Advanced Data Engineering with Databricks
Self-paced: Available in Databricks Academy
Deep working knowledge of Python, SQL, Spark, and Delta Lake
Production experience with Lakeflow/DLT, Unity Catalog, and DABs

🆕 What’s Changed from the April 2024 Syllabus

The November 2025 exam brings significant updates reflecting Databricks’ evolving data engineering platform:

Structural Changes:

Expanded from 6 sections to 10 sections for more granular coverage
Old “Databricks Tooling” section content now distributed across multiple new sections
Old “Data Processing” section split into Ingestion, Transformation, and Data Modelling sections

New Topics Added:

Databricks Asset Bundles (DABs): Major new focus on infrastructure-as-code deployment (Section 1 & 9)
Lakeflow Declarative Pipelines: Replaces Delta Live Tables terminology throughout
APPLY CHANGES API: Explicit coverage for CDC processing in DLT (Section 1)
Liquid Clustering: Replaces Z-Order and partitioning for data layout optimization (Section 10)
System Tables: New emphasis on observability and cost monitoring (Section 5)
Row Filters & Column Masks: Data security now has dedicated coverage (Section 7)

Removed/Reduced Topics:

Partition hints (coalesce, repartition, repartition_by_range, rebalance): Less emphasis in new syllabus
Z-Order indexing: Replaced by Liquid Clustering focus
Bloom filters: Reduced coverage
Manual file size control: Auto-optimization now preferred

Renamed/Evolved Topics:

“Delta Live Tables” → “Lakeflow Declarative Pipelines” or “Lakeflow Spark Declarative Pipelines”
“Streaming with Delta Lake” → Focus now on Streaming Tables vs Materialized Views
Traditional partitioning strategies → Liquid Clustering

Study Tip: If you studied for the April 2024 exam, focus extra time on:

DABs project structure and deployment
Lakeflow/DLT new syntax and APPLY CHANGES API
Liquid Clustering (replaces partitioning/Z-Order decisions)
System Tables for monitoring and observability
Row filters and column masks for security

📖 Background Reading

Before diving into the objectives, these resources provide essential foundational context:

Delta Lake Fundamentals

Understanding the Delta Lake Transaction Log – Read the Blog Post This excellent post provides deep insight into how Delta Lake guarantees ACID transactions. Understanding the transaction log is fundamental to troubleshooting and optimization questions on the exam.

Delta Lake 3.0 and Liquid Clustering – Read the Blog Post Liquid Clustering is the new approach to data layout optimization, replacing partitioning and Z-Order strategies. This is a must-read for the new exam.

How to Clone Delta Lake Tables – Read the Blog Post Covers both shallow and deep clones with use cases – useful for understanding testing and data sharing scenarios.

Streaming & CDC

Simplifying CDC with Change Data Feed – Read the Blog Post Essential reading for understanding how CDF enables incremental processing and delete propagation.

Simplifying Streaming Data Ingestion into Delta Lake – Read the Blog Post Great overview of Auto Loader and streaming patterns.

Stream-Stream Joins in Apache Spark – Read the Blog Post While focusing on Spark 2.3, the concepts of stream-stream joins are still relevant for the exam.

Performance Optimization

Processing Petabytes with Databricks Delta – Read the Blog Post Contains excellent visualizations for understanding Z-Ordering concepts (though now superseded by Liquid Clustering, the principles still apply).

How Databricks Improved Query Performance with Auto-Optimized File Sizes – Read the Blog Post Understanding the “small files” problem and how Databricks addresses it automatically.

Data Governance & Compliance

Handling “Right to be Forgotten” with Delta Live Tables – Read the Blog Post Critical for understanding data purging and GDPR/CCPA compliance – directly relevant to Section 7.

Official Documentation Quick Links

Delta Lake Documentation – Official Delta Lake site
Structured Streaming Programming Guide – Essential Spark reference
Delta Lake Cheatsheet – Quick reference PDF

Free Ebook

Delta Lake: Up & Running by O’Reilly – Get the PDF Comprehensive book covering Delta Lake internals, operations, and best practices. Free in exchange for your email.

📊 Exam Breakdown & Study Strategy

Exam Weight by Section

Understanding how the exam is weighted helps you prioritize your study time. The Professional exam covers 10 sections with 42 total objectives:

Section	Topics	Objectives	Study Priority
Section 1: Developing Code	DABs, UDFs, DLT, Jobs, Testing	11	🔴 Critical
Section 5: Monitoring and Alerting	System Tables, Spark UI, APIs	6	🔴 Critical
Section 6: Cost & Performance	Optimization, CDF, Query Profile	5	🔴 Critical
Section 7: Security and Compliance	ACLs, Masking, PII, Purging	5	🔴 Critical
Section 9: Debugging and Deploying	Spark UI, DABs, Git CI/CD	5	🟡 High
Section 10: Data Modelling	Delta, Liquid Clustering, Dimensional	4	🟡 High
Section 4: Data Sharing	Delta Sharing, Federation	3	🟡 High
Section 2: Data Ingestion	Multi-format, Streaming/Batch	2	🟢 Medium
Section 3: Transformation	Window Functions, Quarantining	2	🟢 Medium
Section 8: Data Governance	Metadata, Permissions	2	🟢 Medium

🎯 How to Use This Guide Effectively

I’ve organized resources into four categories for each exam objective. Here’s how I recommend using them:

📚 Official Documentation (docs.databricks.com)

This is your primary reference for the Professional exam. You need to know the details – syntax, configuration options, and edge cases.

My approach:

Read the conceptual overview AND the API/syntax reference
Understand the “Limitations” and “Best Practices” sections – exam questions often test edge cases
Pay special attention to Unity Catalog, DABs, and DLT documentation

Best for: Understanding exact syntax, parameters, and technical specifications

🎯 Interactive Demos (databricks.com/resources/demos)

For the Professional exam, demos help you understand complex workflows and enterprise patterns.

How I use demos:

Watch the end-to-end flow first – understand how components connect
Focus on configuration details – the exam tests specific settings
Recreate in your workspace – you need hands-on experience

Best for: Understanding complex workflows and enterprise-scale patterns

🎓 Training Resources (Databricks Academy)

The Advanced Data Engineering course is highly recommended for this exam.

Training Courses:

Advanced Data Engineering with Databricks – Critical for Sections 1, 5, 6, 9
Data Management and Governance with Unity Catalog – Critical for Sections 7, 8

Best for: Deep dives into complex topics and hands-on labs

My Recommended Study Path

Phase 1: Core Development (Sections 1, 2, 3)

Master Lakeflow Declarative Pipelines – streaming tables, materialized views, expectations
Understand APPLY CHANGES API for CDC
Practice writing and testing UDFs
Learn DABs project structure and deployment

Phase 2: Operations & Monitoring (Sections 5, 9)

Deep dive into System Tables for observability
Master Query Profiler and Spark UI analysis
Understand event logs for DLT debugging
Practice job repair and troubleshooting

Phase 3: Optimization (Section 6)

Learn Delta optimization – deletion vectors, liquid clustering
Understand data skipping and file pruning
Master Change Data Feed (CDF) for incremental processing

Phase 4: Security & Governance (Sections 7, 8)

Implement row filters and column masks
Understand PII detection and masking strategies
Learn data purging for compliance

Phase 5: Sharing & Modeling (Sections 4, 10)

Configure Delta Sharing (D2D and D2O)
Set up Lakehouse Federation
Design dimensional models with liquid clustering

Practice & Validation

Hands-On Practice (This is critical for Professional!):

Sign up for Databricks Free Edition
Build production-style pipelines with DLT expectations and CDC
Deploy pipelines using DABs with multiple environments
Implement row filters and column masks on sensitive data
Practice debugging with Query Profiler and event logs
Set up Delta Sharing between workspaces

Key Differentiators from Associate:

Professional tests optimization and debugging skills heavily
You need to know “when to use what” not just “how to use”
Security and governance questions are more scenario-based
Expect questions about troubleshooting production issues

Section 1: Developing Code for Data Processing using Python and SQL

Section Overview: 11 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

🎥 Product Tours (Quick 3-5 minute overviews):

📹 Video Demos (In-depth demonstrations):

1.1 Scalable Python Project Structure with DABs

Objective: Design and implement a scalable Python project structure optimized for Databricks Asset Bundles (DABs), enabling modular development, deployment automation, and CI/CD integration.

📚 Official Documentation:

Databricks Asset Bundles

Top Demos:

Databricks Asset Bundles

🎓 Training Resources:

1.2 Managing External Libraries and Dependencies

Objective: Manage and troubleshoot external third-party library installations and dependencies in Databricks, including PyPI packages, local wheels, and source archives.

📚 Official Documentation:

Cluster Libraries

1.3 Pandas/Python User-Defined Functions (UDFs)

Objective: Develop User-Defined Functions (UDFs) using Pandas/Python UDF.

📚 Official Documentation:

Pandas UDFs

1.4 Production Data Pipelines with Lakeflow & Auto Loader

Objective: Build and manage reliable, production-ready data pipelines for batch and streaming data using Lakeflow Spark Declarative Pipelines and Autoloader.

📚 Official Documentation:

Top Demos:

1.5 ETL Workflow Automation with Jobs

Objective: Create and Automate ETL workloads using Jobs via UI/APIs/CLI.

📚 Official Documentation:

Top Demos:

1.6 Streaming Tables vs Materialized Views

Objective: Explain the advantages and disadvantages of streaming tables compared to materialized views.

📚 Official Documentation:

Streaming Tables and Materialized Views

1.7 CDC with APPLY CHANGES API

Objective: Use APPLY CHANGES APIs to simplify CDC in Lakeflow Spark Declarative Pipelines.

📚 Official Documentation:

APPLY CHANGES APIs

Top Demos:

1.8 Structured Streaming vs Lakeflow Pipelines

Objective: Compare Spark Structured Streaming and Lakeflow Spark Declarative Pipelines to determine the optimal approach for building scalable ETL pipelines.

📚 Official Documentation:

Top Demos:

1.9 Control Flow Operators in Pipelines

Objective: Create a pipeline component that uses control flow operators (e.g., if/else, for/each, etc.).

1.10 Environment and Task Configuration

Objective: Choose the appropriate configs for environments and dependencies, high memory for notebook tasks, and auto-optimization to disallow retries.

📚 Official Documentation:

Job Repair

1.11 Unit and Integration Testing

Objective: Develop unit and integration tests using assertDataFrameEqual, assertSchemaEqual, DataFrame.transform, and testing frameworks, to ensure code correctness, including a built-in debugger.

📚 Official Documentation:

Section 2: Data Ingestion & Acquisition

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

📹 Video Demos (In-depth demonstrations):

Data Ingestion Using Auto Loader

2.1 Multi-Format Data Ingestion Pipelines

Objective: Design and implement data ingestion pipelines to efficiently ingest a variety of data formats including Delta Lake, Parquet, ORC, AVRO, JSON, CSV, XML, Text and Binary from diverse sources such as message buses and cloud storage.

📚 Official Documentation:

Top Demos:

🎓 Training Resources:

2.2 Append-Only Batch and Streaming Pipelines

Objective: Create an append-only data pipeline capable of handling both batch and streaming data using Delta.

📚 Official Documentation:

Delta Lake

Top Demos:

Section 3: Data Transformation, Cleansing, and Quality

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

Lakeflow Declarative Pipeline

🎥 Product Tours (Quick 3-5 minute overviews):

Introducing Lakeflow Designer

3.1 Advanced Spark SQL and PySpark Transformations

Objective: Write efficient Spark SQL and PySpark code to apply advanced data transformations, including window functions, joins, and aggregations, to manipulate and analyze large Datasets.

📚 Official Documentation:

🎓 Training Resources:

Apache Spark Programming with Databricks

3.2 Bad Data Quarantining Process

Objective: Develop a quarantining process for bad data with Lakeflow Spark Declarative Pipelines, or autoloader in classic jobs.

📚 Official Documentation:

DLT Expectations

Top Demos:

Monitor Your Data Quality With Lakehouse Monitoring

Section 4: Data Sharing and Federation

Section Overview: 3 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

Delta Sharing Airlines

🎥 Product Tours (Quick 3-5 minute overviews):

Query Federation Product Tour

📹 Video Demos (In-depth demonstrations):

4.1 Delta Sharing (D2D and D2O)

Objective: Demonstrate delta sharing securely between Databricks deployments using Databricks to Databricks Sharing (D2D) or to external platforms using the open sharing protocol (D2O).

📚 Official Documentation:

Top Demos:

🎓 Training Resources:

Data Interoperability with Unity Catalog

4.2 Lakehouse Federation Configuration

Objective: Configure Lakehouse Federation with proper governance across the supported source Systems.

📚 Official Documentation:

Lakehouse Federation

Top Demos:

4.3 Sharing Live Data with Delta Share

Objective: Use Delta Share to share live data from Lakehouse to any computing platform.

📚 Official Documentation:

Section 5: Monitoring and Alerting

Section Overview: 6 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

Monitor Your Data Quality with Lakehouse Monitoring

🎥 Product Tours (Quick 3-5 minute overviews):

Lakehouse Monitoring Databricks

📹 Video Demos (In-depth demonstrations):

Databricks Workflows

5.1 System Tables for Observability

Objective: Use system tables for observability over resource utilization, cost, auditing and workload monitoring.

📚 Official Documentation:

System Tables

Top Demos:

System Tables

🎓 Training Resources:

Advanced Data Engineering with Databricks

5.2 Query Profiler and Spark UI Monitoring

Objective: Use Query Profiler UI and Spark UI to monitor workloads.

📚 Official Documentation:

Query Profiler

Top Demos:

Monitor Your Data Quality With Lakehouse Monitoring

5.3 REST API and CLI for Job Monitoring

Objective: Use the Databricks REST APIs/Databricks CLI for monitoring jobs and pipelines.

📚 Official Documentation:

Top Demos:

5.4 Lakeflow Pipeline Event Logs

Objective: Use Lakeflow Spark Declarative Pipelines Event Logs to monitor pipelines.

📚 Official Documentation:

Top Demos:

5.5 SQL Alerts for Data Quality

Objective: Use SQL Alerts to monitor data quality.

📚 Official Documentation:

Top Demos:

Monitor Your Data Quality With Lakehouse Monitoring

5.6 Job Notifications and Alerting

Objective: Use the Lakeflow Jobs UI and Jobs API to set up notifications for job status and performance issues.

📚 Official Documentation:

Section 6: Cost & Performance Optimisation

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

📹 Video Demos (In-depth demonstrations):

Delta Lake Deep Dive

6.1 Unity Catalog Managed Tables Benefits

Objective: Understand how / why using Unity Catalog managed tables reduces operations Overhead and maintenance burden.

Top Demos:

🎓 Training Resources:

Advanced Data Engineering with Databricks

6.2 Delta Optimization Techniques

Objective: Understand delta optimization techniques, such as deletion vectors and liquid clustering.

📚 Official Documentation:

6.3 Query Optimization Techniques

Objective: Understand the optimization techniques used by Databricks to ensure the performance of queries on large datasets (data skipping, file pruning, etc.).

📚 Official Documentation:

Data Skipping

6.4 Change Data Feed (CDF) for Streaming

Objective: Apply Change Data Feed (CDF) to address specific limitations of streaming tables and enhance latency.

📚 Official Documentation:

Change Data Feed

Top Demos:

6.5 Query Profile Analysis and Bottlenecks

Objective: Use the query profile to analyze the query and identify bottlenecks, such as bad data skipping, inefficient types of joins, and data shuffling.

📚 Official Documentation:

SQL Language Reference

Section 7: Ensuring Data Security and Compliance

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

Table ACL and Dynamic Views with UC

🎥 Product Tours (Quick 3-5 minute overviews):

Unity Catalog Data Permissions And External Locations

📹 Video Demos (In-depth demonstrations):

Access Controls With Unity Catalog

7.1 Workspace ACLs and Least Privilege

Objective: Use ACLs to secure Workspace Objects, enforcing the principle of least privilege, including enforcing principles like least privilege, policy enforcement.

📚 Official Documentation:

Access Control

Top Demos:

Table Acl And Dynamic Views With Uc

🎓 Training Resources:

Data Management and Governance with Unity Catalog

7.2 Row Filters and Column Masks

Objective: Use row filters and column masks to filter and mask sensitive table data.

📚 Official Documentation:

Row Filters and Column Masks

7.3 Data Anonymization and Pseudonymization

Objective: Apply anonymization and pseudonymization methods, such as Hashing, Tokenization, Suppression, and generalisation, to confidential data.

📚 Official Documentation:

Data Masking

7.4 PII Detection and Masking Pipelines

Objective: Implement a compliant batch & streaming pipeline that detects and applies masking of PII to ensure data privacy.

📚 Official Documentation:

Data Masking

7.5 Data Purging and Retention Compliance

Objective: Develop a data purging solution ensuring compliance with data retention policies.

📚 Official Documentation:

VACUUM

Section 8: Data Governance

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎓 Hands-On Tutorials (Follow along in your workspace):

Data Lineage with Unity Catalog

🎥 Product Tours (Quick 3-5 minute overviews):

📹 Video Demos (In-depth demonstrations):

Automated Data Lineage with Unity Catalog

8.1 Metadata and Data Discoverability

Objective: Create and add descriptions/metadata about enterprise data to make it more discoverable.

📚 Official Documentation:

Tags and Metadata

Top Demos:

🎓 Training Resources:

Data Management and Governance with Unity Catalog

8.2 Unity Catalog Permission Inheritance

Objective: Demonstrate understanding of Unity Catalog permission inheritance model.

📚 Official Documentation:

Top Demos:

Section 9: Debugging and Deploying

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

🎥 Product Tours (Quick 3-5 minute overviews):

Databricks Asset Bundles

📹 Video Demos (In-depth demonstrations):

Databricks Workflows

9.1 Diagnostic Information and Troubleshooting

Objective: Identify pertinent diagnostic information using Spark UI, cluster logs, system tables, and query profiles to troubleshoot errors.

📚 Official Documentation:

Top Demos:

System Tables

🎓 Training Resources:

Data Engineering with Databricks

9.2 Job Repair and Parameter Overrides

Objective: Analyze the errors and remediate the failed job runs with job repairs and parameter overrides.

📚 Official Documentation:

9.3 Debugging Lakeflow and Spark Pipelines

Objective: Use Lakeflow Spark Declarative Pipelines event logs and the Spark UI to debug Lakeflow Spark Declarative Pipelines and Spark pipelines.

📚 Official Documentation:

Top Demos:

9.4 Deploying with Databricks Asset Bundles

Objective: Build and deploy Databricks resources using Databricks Asset Bundles.

📚 Official Documentation:

Databricks Asset Bundles

Top Demos:

9.5 Git-based CI/CD Workflows

Objective: Configure and integrate with Git-based CI/CD workflows using Databricks Git Folders for notebook and code deployment.

📚 Official Documentation:

Git Folders

Top Demos:

Getting Started With Databricks Repos

Section 10: Data Modelling

Section Overview: 4 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

📹 Video Demos (In-depth demonstrations):

Intro To Databricks Lakehouse Platform

10.1 Scalable Data Models with Delta Lake

Objective: Design and implement scalable data models using Delta Lake to manage large datasets.

📚 Official Documentation:

Delta Lake

Top Demos:

🎓 Training Resources:

Advanced Data Engineering with Databricks

10.2 Liquid Clustering for Query Performance

Objective: Simplify data layout decisions and optimize query performance using Liquid Clustering.

📚 Official Documentation:

10.3 Liquid Clustering vs Partitioning/Z-Order

Objective: Identify the benefits of using liquid Clustering over Partitioning and ZOrder.

📚 Official Documentation:

10.4 Dimensional Modeling for Analytics

Objective: Design Dimensional Models for analytical workloads, ensuring efficient querying and aggregation.

📚 Official Documentation:

Data Modeling Best Practices

Study Resources

Official Training

Advanced Data Engineering with Databricks (Databricks Academy) – Highly Recommended
Data Engineering with Databricks (Databricks Academy)
Data Management and Governance with Unity Catalog (Databricks Academy)

Certification Information

Data Engineer Professional Exam Page
Databricks Free Edition – Practice for free

Key Documentation

Lakeflow Declarative Pipelines – ETL pipelines with DLT
APPLY CHANGES API – CDC processing
Databricks Asset Bundles – CI/CD and deployment
System Tables – Observability and monitoring
Query Profiler – Performance analysis
Row Filters and Column Masks – Data security
Liquid Clustering – Performance optimization
Delta Sharing – Data sharing

Last Updated: February 02 26, 2026 Exam Version: November 2025

Databricks Certified Data Engineer Professional – Comprehensive Resource Guide

How to Use This Guide

About the Author

About the Exam

Recommended Preparation

🆕 What’s Changed from the April 2024 Syllabus

📖 Background Reading

Delta Lake Fundamentals

Streaming & CDC

Performance Optimization

Data Governance & Compliance

Official Documentation Quick Links

Free Ebook

📊 Exam Breakdown & Study Strategy

Exam Weight by Section

🎯 How to Use This Guide Effectively

📚 Official Documentation (docs.databricks.com)

🎯 Interactive Demos (databricks.com/resources/demos)

🎓 Training Resources (Databricks Academy)

My Recommended Study Path

Phase 1: Core Development (Sections 1, 2, 3)

Phase 2: Operations & Monitoring (Sections 5, 9)

Phase 3: Optimization (Section 6)

Phase 4: Security & Governance (Sections 7, 8)

Phase 5: Sharing & Modeling (Sections 4, 10)

Practice & Validation

Section 1: Developing Code for Data Processing using Python and SQL

Recommended Demos for This Section

1.1 Scalable Python Project Structure with DABs

1.2 Managing External Libraries and Dependencies

1.3 Pandas/Python User-Defined Functions (UDFs)

1.4 Production Data Pipelines with Lakeflow & Auto Loader

1.5 ETL Workflow Automation with Jobs

1.6 Streaming Tables vs Materialized Views

1.7 CDC with APPLY CHANGES API

1.8 Structured Streaming vs Lakeflow Pipelines

1.9 Control Flow Operators in Pipelines

1.10 Environment and Task Configuration

1.11 Unit and Integration Testing

Section 2: Data Ingestion & Acquisition

Recommended Demos for This Section

2.1 Multi-Format Data Ingestion Pipelines

2.2 Append-Only Batch and Streaming Pipelines

Section 3: Data Transformation, Cleansing, and Quality

Recommended Demos for This Section

3.1 Advanced Spark SQL and PySpark Transformations

3.2 Bad Data Quarantining Process

Section 4: Data Sharing and Federation

Recommended Demos for This Section

4.1 Delta Sharing (D2D and D2O)

4.2 Lakehouse Federation Configuration

4.3 Sharing Live Data with Delta Share

Section 5: Monitoring and Alerting

Recommended Demos for This Section

5.1 System Tables for Observability

5.2 Query Profiler and Spark UI Monitoring

5.3 REST API and CLI for Job Monitoring

5.4 Lakeflow Pipeline Event Logs

5.5 SQL Alerts for Data Quality

5.6 Job Notifications and Alerting

Section 6: Cost & Performance Optimisation

Recommended Demos for This Section

6.1 Unity Catalog Managed Tables Benefits

6.2 Delta Optimization Techniques

6.3 Query Optimization Techniques

6.4 Change Data Feed (CDF) for Streaming

6.5 Query Profile Analysis and Bottlenecks

Section 7: Ensuring Data Security and Compliance

Recommended Demos for This Section

7.1 Workspace ACLs and Least Privilege

7.2 Row Filters and Column Masks

7.3 Data Anonymization and Pseudonymization

7.4 PII Detection and Masking Pipelines

7.5 Data Purging and Retention Compliance

Section 8: Data Governance

Recommended Demos for This Section

8.1 Metadata and Data Discoverability

8.2 Unity Catalog Permission Inheritance

Section 9: Debugging and Deploying

Recommended Demos for This Section