Databricks Certified Data Engineer Professional – Comprehensive Resource Guide

A curated collection of demos, blog posts, official documentation, and training resources mapped to each exam objective for the Databricks Certified Data Engineer Professional certification (November 2025 version). This supercedes my previous blog post series for the last exam syllabus.

How to Use This Guide

For each exam section and objective, this guide provides:

  • πŸ“š Official Documentation: Direct links to official Databricks docs (docs.databricks.com)
  • 🎯 Demos: Interactive demonstrations and tutorials
  • ✍️ Blog Posts: Technical articles and best practices
  • πŸŽ“ Training Resources: Courses, certifications, and learning materials

Resources are ranked by relevance score based on keyword matching. Review multiple resources for each objective to get comprehensive coverage.

About the Author

I’m a Databricks Solutions Architect Champion with extensive experience in data engineering and lakehouse architecture. This guide is designed to help you navigate the Data Engineer Professional certification, which is one of the most challenging Databricks certifications.

The Data Engineer Professional exam tests your ability to build production-grade, enterprise-scale data engineering solutions. This goes beyond basic pipeline development – you need to demonstrate expertise in optimization, security, governance, monitoring, and deployment automation.

I created this guide by analyzing the exam objectives and mapping them to the best available resources. My advice: this exam requires hands-on experience. You should have built real production pipelines, optimized queries, implemented security controls, and deployed with DABs before attempting this exam.

Find out what works best for you. Good luck on your Databricks certification journey!


About the Exam

  • Exam Name: Databricks Certified Data Engineer Professional
  • Exam Date: November 30, 2025
  • Questions: 59 scored multiple-choice
  • Time Limit: 120 minutes
  • Registration Fee: USD $200
  • Validity: 2 years
  • Prerequisite: Data Engineer Associate recommended; 2+ years hands-on experience strongly recommended

Recommended Preparation

  • Instructor-led: Advanced Data Engineering with Databricks
  • Self-paced: Available in Databricks Academy
  • Deep working knowledge of Python, SQL, Spark, and Delta Lake
  • Production experience with Lakeflow/DLT, Unity Catalog, and DABs

πŸ†• What’s Changed from the April 2024 Syllabus

The November 2025 exam brings significant updates reflecting Databricks’ evolving data engineering platform:

Structural Changes:

  • Expanded from 6 sections to 10 sections for more granular coverage
  • Old β€œDatabricks Tooling” section content now distributed across multiple new sections
  • Old β€œData Processing” section split into Ingestion, Transformation, and Data Modelling sections

New Topics Added:

  • Databricks Asset Bundles (DABs): Major new focus on infrastructure-as-code deployment (Section 1 & 9)
  • Lakeflow Declarative Pipelines: Replaces Delta Live Tables terminology throughout
  • APPLY CHANGES API: Explicit coverage for CDC processing in DLT (Section 1)
  • Liquid Clustering: Replaces Z-Order and partitioning for data layout optimization (Section 10)
  • System Tables: New emphasis on observability and cost monitoring (Section 5)
  • Row Filters & Column Masks: Data security now has dedicated coverage (Section 7)

Removed/Reduced Topics:

  • Partition hints (coalesce, repartition, repartition_by_range, rebalance): Less emphasis in new syllabus
  • Z-Order indexing: Replaced by Liquid Clustering focus
  • Bloom filters: Reduced coverage
  • Manual file size control: Auto-optimization now preferred

Renamed/Evolved Topics:

  • β€œDelta Live Tables” β†’ β€œLakeflow Declarative Pipelines” or β€œLakeflow Spark Declarative Pipelines”
  • β€œStreaming with Delta Lake” β†’ Focus now on Streaming Tables vs Materialized Views
  • Traditional partitioning strategies β†’ Liquid Clustering

Study Tip: If you studied for the April 2024 exam, focus extra time on:

  • DABs project structure and deployment
  • Lakeflow/DLT new syntax and APPLY CHANGES API
  • Liquid Clustering (replaces partitioning/Z-Order decisions)
  • System Tables for monitoring and observability
  • Row filters and column masks for security

πŸ“– Background Reading

Before diving into the objectives, these resources provide essential foundational context:

Delta Lake Fundamentals

Understanding the Delta Lake Transaction LogRead the Blog Post This excellent post provides deep insight into how Delta Lake guarantees ACID transactions. Understanding the transaction log is fundamental to troubleshooting and optimization questions on the exam.

Delta Lake 3.0 and Liquid ClusteringRead the Blog Post Liquid Clustering is the new approach to data layout optimization, replacing partitioning and Z-Order strategies. This is a must-read for the new exam.

How to Clone Delta Lake TablesRead the Blog Post Covers both shallow and deep clones with use cases – useful for understanding testing and data sharing scenarios.

Streaming & CDC

Simplifying CDC with Change Data FeedRead the Blog Post Essential reading for understanding how CDF enables incremental processing and delete propagation.

Simplifying Streaming Data Ingestion into Delta LakeRead the Blog Post Great overview of Auto Loader and streaming patterns.

Stream-Stream Joins in Apache SparkRead the Blog Post While focusing on Spark 2.3, the concepts of stream-stream joins are still relevant for the exam.

Performance Optimization

Processing Petabytes with Databricks DeltaRead the Blog Post Contains excellent visualizations for understanding Z-Ordering concepts (though now superseded by Liquid Clustering, the principles still apply).

How Databricks Improved Query Performance with Auto-Optimized File SizesRead the Blog Post Understanding the β€œsmall files” problem and how Databricks addresses it automatically.

Data Governance & Compliance

Handling β€œRight to be Forgotten” with Delta Live TablesRead the Blog Post Critical for understanding data purging and GDPR/CCPA compliance – directly relevant to Section 7.

Official Documentation Quick Links

Free Ebook

Delta Lake: Up & Running by O’ReillyGet the PDF Comprehensive book covering Delta Lake internals, operations, and best practices. Free in exchange for your email.


πŸ“Š Exam Breakdown & Study Strategy

Exam Weight by Section

Understanding how the exam is weighted helps you prioritize your study time. The Professional exam covers 10 sections with 42 total objectives:

Section Topics Objectives Study Priority
Section 1: Developing Code DABs, UDFs, DLT, Jobs, Testing 11 πŸ”΄ Critical
Section 5: Monitoring and Alerting System Tables, Spark UI, APIs 6 πŸ”΄ Critical
Section 6: Cost & Performance Optimization, CDF, Query Profile 5 πŸ”΄ Critical
Section 7: Security and Compliance ACLs, Masking, PII, Purging 5 πŸ”΄ Critical
Section 9: Debugging and Deploying Spark UI, DABs, Git CI/CD 5 🟑 High
Section 10: Data Modelling Delta, Liquid Clustering, Dimensional 4 🟑 High
Section 4: Data Sharing Delta Sharing, Federation 3 🟑 High
Section 2: Data Ingestion Multi-format, Streaming/Batch 2 🟒 Medium
Section 3: Transformation Window Functions, Quarantining 2 🟒 Medium
Section 8: Data Governance Metadata, Permissions 2 🟒 Medium

🎯 How to Use This Guide Effectively

I’ve organized resources into four categories for each exam objective. Here’s how I recommend using them:

πŸ“š Official Documentation (docs.databricks.com)

This is your primary reference for the Professional exam. You need to know the details – syntax, configuration options, and edge cases.

My approach:

  • Read the conceptual overview AND the API/syntax reference
  • Understand the β€œLimitations” and β€œBest Practices” sections – exam questions often test edge cases
  • Pay special attention to Unity Catalog, DABs, and DLT documentation

Best for: Understanding exact syntax, parameters, and technical specifications


🎯 Interactive Demos (databricks.com/resources/demos)

For the Professional exam, demos help you understand complex workflows and enterprise patterns.

How I use demos:

  1. Watch the end-to-end flow first – understand how components connect
  2. Focus on configuration details – the exam tests specific settings
  3. Recreate in your workspace – you need hands-on experience

Best for: Understanding complex workflows and enterprise-scale patterns


πŸŽ“ Training Resources (Databricks Academy)

The Advanced Data Engineering course is highly recommended for this exam.

Training Courses:

  • Advanced Data Engineering with Databricks – Critical for Sections 1, 5, 6, 9
  • Data Management and Governance with Unity Catalog – Critical for Sections 7, 8

Best for: Deep dives into complex topics and hands-on labs


My Recommended Study Path

Phase 1: Core Development (Sections 1, 2, 3)

  1. Master Lakeflow Declarative Pipelines – streaming tables, materialized views, expectations
  2. Understand APPLY CHANGES API for CDC
  3. Practice writing and testing UDFs
  4. Learn DABs project structure and deployment

Phase 2: Operations & Monitoring (Sections 5, 9)

  1. Deep dive into System Tables for observability
  2. Master Query Profiler and Spark UI analysis
  3. Understand event logs for DLT debugging
  4. Practice job repair and troubleshooting

Phase 3: Optimization (Section 6)

  1. Learn Delta optimization – deletion vectors, liquid clustering
  2. Understand data skipping and file pruning
  3. Master Change Data Feed (CDF) for incremental processing

Phase 4: Security & Governance (Sections 7, 8)

  1. Implement row filters and column masks
  2. Understand PII detection and masking strategies
  3. Learn data purging for compliance

Phase 5: Sharing & Modeling (Sections 4, 10)

  1. Configure Delta Sharing (D2D and D2O)
  2. Set up Lakehouse Federation
  3. Design dimensional models with liquid clustering

Practice & Validation

Hands-On Practice (This is critical for Professional!):

  • Sign up for Databricks Free Edition
  • Build production-style pipelines with DLT expectations and CDC
  • Deploy pipelines using DABs with multiple environments
  • Implement row filters and column masks on sensitive data
  • Practice debugging with Query Profiler and event logs
  • Set up Delta Sharing between workspaces

Key Differentiators from Associate:

  • Professional tests optimization and debugging skills heavily
  • You need to know β€œwhen to use what” not just β€œhow to use”
  • Security and governance questions are more scenario-based
  • Expect questions about troubleshooting production issues

Section 1: Developing Code for Data Processing using Python and SQL

Section Overview: 11 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


1.1 Scalable Python Project Structure with DABs

Objective: Design and implement a scalable Python project structure optimized for Databricks Asset Bundles (DABs), enabling modular development, deployment automation, and CI/CD integration.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


1.2 Managing External Libraries and Dependencies

Objective: Manage and troubleshoot external third-party library installations and dependencies in Databricks, including PyPI packages, local wheels, and source archives.

πŸ“š Official Documentation:


1.3 Pandas/Python User-Defined Functions (UDFs)

Objective: Develop User-Defined Functions (UDFs) using Pandas/Python UDF.

πŸ“š Official Documentation:


1.4 Production Data Pipelines with Lakeflow & Auto Loader

Objective: Build and manage reliable, production-ready data pipelines for batch and streaming data using Lakeflow Spark Declarative Pipelines and Autoloader.

πŸ“š Official Documentation:

Top Demos:


1.5 ETL Workflow Automation with Jobs

Objective: Create and Automate ETL workloads using Jobs via UI/APIs/CLI.

πŸ“š Official Documentation:

Top Demos:


1.6 Streaming Tables vs Materialized Views

Objective: Explain the advantages and disadvantages of streaming tables compared to materialized views.

πŸ“š Official Documentation:


1.7 CDC with APPLY CHANGES API

Objective: Use APPLY CHANGES APIs to simplify CDC in Lakeflow Spark Declarative Pipelines.

πŸ“š Official Documentation:

Top Demos:


1.8 Structured Streaming vs Lakeflow Pipelines

Objective: Compare Spark Structured Streaming and Lakeflow Spark Declarative Pipelines to determine the optimal approach for building scalable ETL pipelines.

πŸ“š Official Documentation:

Top Demos:


1.9 Control Flow Operators in Pipelines

Objective: Create a pipeline component that uses control flow operators (e.g., if/else, for/each, etc.).


1.10 Environment and Task Configuration

Objective: Choose the appropriate configs for environments and dependencies, high memory for notebook tasks, and auto-optimization to disallow retries.

πŸ“š Official Documentation:


1.11 Unit and Integration Testing

Objective: Develop unit and integration tests using assertDataFrameEqual, assertSchemaEqual, DataFrame.transform, and testing frameworks, to ensure code correctness, including a built-in debugger.

πŸ“š Official Documentation:


Section 2: Data Ingestion & Acquisition

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸ“Ή Video Demos (In-depth demonstrations):


2.1 Multi-Format Data Ingestion Pipelines

Objective: Design and implement data ingestion pipelines to efficiently ingest a variety of data formats including Delta Lake, Parquet, ORC, AVRO, JSON, CSV, XML, Text and Binary from diverse sources such as message buses and cloud storage.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


2.2 Append-Only Batch and Streaming Pipelines

Objective: Create an append-only data pipeline capable of handling both batch and streaming data using Delta.

πŸ“š Official Documentation:

Top Demos:


Section 3: Data Transformation, Cleansing, and Quality

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):


3.1 Advanced Spark SQL and PySpark Transformations

Objective: Write efficient Spark SQL and PySpark code to apply advanced data transformations, including window functions, joins, and aggregations, to manipulate and analyze large Datasets.

πŸ“š Official Documentation:

πŸŽ“ Training Resources:


3.2 Bad Data Quarantining Process

Objective: Develop a quarantining process for bad data with Lakeflow Spark Declarative Pipelines, or autoloader in classic jobs.

πŸ“š Official Documentation:

Top Demos:


Section 4: Data Sharing and Federation

Section Overview: 3 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


4.1 Delta Sharing (D2D and D2O)

Objective: Demonstrate delta sharing securely between Databricks deployments using Databricks to Databricks Sharing (D2D) or to external platforms using the open sharing protocol (D2O).

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


4.2 Lakehouse Federation Configuration

Objective: Configure Lakehouse Federation with proper governance across the supported source Systems.

πŸ“š Official Documentation:

Top Demos:


4.3 Sharing Live Data with Delta Share

Objective: Use Delta Share to share live data from Lakehouse to any computing platform.

πŸ“š Official Documentation:


Section 5: Monitoring and Alerting

Section Overview: 6 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


5.1 System Tables for Observability

Objective: Use system tables for observability over resource utilization, cost, auditing and workload monitoring.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


5.2 Query Profiler and Spark UI Monitoring

Objective: Use Query Profiler UI and Spark UI to monitor workloads.

πŸ“š Official Documentation:

Top Demos:


5.3 REST API and CLI for Job Monitoring

Objective: Use the Databricks REST APIs/Databricks CLI for monitoring jobs and pipelines.

πŸ“š Official Documentation:

Top Demos:


5.4 Lakeflow Pipeline Event Logs

Objective: Use Lakeflow Spark Declarative Pipelines Event Logs to monitor pipelines.

πŸ“š Official Documentation:

Top Demos:


5.5 SQL Alerts for Data Quality

Objective: Use SQL Alerts to monitor data quality.

πŸ“š Official Documentation:

Top Demos:


5.6 Job Notifications and Alerting

Objective: Use the Lakeflow Jobs UI and Jobs API to set up notifications for job status and performance issues.

πŸ“š Official Documentation:


Section 6: Cost & Performance Optimisation

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸ“Ή Video Demos (In-depth demonstrations):


6.1 Unity Catalog Managed Tables Benefits

Objective: Understand how / why using Unity Catalog managed tables reduces operations Overhead and maintenance burden.

Top Demos:

πŸŽ“ Training Resources:


6.2 Delta Optimization Techniques

Objective: Understand delta optimization techniques, such as deletion vectors and liquid clustering.

πŸ“š Official Documentation:


6.3 Query Optimization Techniques

Objective: Understand the optimization techniques used by Databricks to ensure the performance of queries on large datasets (data skipping, file pruning, etc.).

πŸ“š Official Documentation:


6.4 Change Data Feed (CDF) for Streaming

Objective: Apply Change Data Feed (CDF) to address specific limitations of streaming tables and enhance latency.

πŸ“š Official Documentation:

Top Demos:


6.5 Query Profile Analysis and Bottlenecks

Objective: Use the query profile to analyze the query and identify bottlenecks, such as bad data skipping, inefficient types of joins, and data shuffling.

πŸ“š Official Documentation:


Section 7: Ensuring Data Security and Compliance

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


7.1 Workspace ACLs and Least Privilege

Objective: Use ACLs to secure Workspace Objects, enforcing the principle of least privilege, including enforcing principles like least privilege, policy enforcement.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


7.2 Row Filters and Column Masks

Objective: Use row filters and column masks to filter and mask sensitive table data.

πŸ“š Official Documentation:


7.3 Data Anonymization and Pseudonymization

Objective: Apply anonymization and pseudonymization methods, such as Hashing, Tokenization, Suppression, and generalisation, to confidential data.

πŸ“š Official Documentation:


7.4 PII Detection and Masking Pipelines

Objective: Implement a compliant batch & streaming pipeline that detects and applies masking of PII to ensure data privacy.

πŸ“š Official Documentation:


7.5 Data Purging and Retention Compliance

Objective: Develop a data purging solution ensuring compliance with data retention policies.

πŸ“š Official Documentation:


Section 8: Data Governance

Section Overview: 2 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ“ Hands-On Tutorials (Follow along in your workspace):

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


8.1 Metadata and Data Discoverability

Objective: Create and add descriptions/metadata about enterprise data to make it more discoverable.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


8.2 Unity Catalog Permission Inheritance

Objective: Demonstrate understanding of Unity Catalog permission inheritance model.

πŸ“š Official Documentation:

Top Demos:


Section 9: Debugging and Deploying

Section Overview: 5 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸŽ₯ Product Tours (Quick 3-5 minute overviews):

πŸ“Ή Video Demos (In-depth demonstrations):


9.1 Diagnostic Information and Troubleshooting

Objective: Identify pertinent diagnostic information using Spark UI, cluster logs, system tables, and query profiles to troubleshoot errors.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


9.2 Job Repair and Parameter Overrides

Objective: Analyze the errors and remediate the failed job runs with job repairs and parameter overrides.

πŸ“š Official Documentation:


9.3 Debugging Lakeflow and Spark Pipelines

Objective: Use Lakeflow Spark Declarative Pipelines event logs and the Spark UI to debug Lakeflow Spark Declarative Pipelines and Spark pipelines.

πŸ“š Official Documentation:

Top Demos:


9.4 Deploying with Databricks Asset Bundles

Objective: Build and deploy Databricks resources using Databricks Asset Bundles.

πŸ“š Official Documentation:

Top Demos:


9.5 Git-based CI/CD Workflows

Objective: Configure and integrate with Git-based CI/CD workflows using Databricks Git Folders for notebook and code deployment.

πŸ“š Official Documentation:

Top Demos:


Section 10: Data Modelling

Section Overview: 4 objectives

Recommended Demos for This Section

Start with these demos to get hands-on experience:

πŸ“Ή Video Demos (In-depth demonstrations):


10.1 Scalable Data Models with Delta Lake

Objective: Design and implement scalable data models using Delta Lake to manage large datasets.

πŸ“š Official Documentation:

Top Demos:

πŸŽ“ Training Resources:


10.2 Liquid Clustering for Query Performance

Objective: Simplify data layout decisions and optimize query performance using Liquid Clustering.

πŸ“š Official Documentation:


10.3 Liquid Clustering vs Partitioning/Z-Order

Objective: Identify the benefits of using liquid Clustering over Partitioning and ZOrder.

πŸ“š Official Documentation:


10.4 Dimensional Modeling for Analytics

Objective: Design Dimensional Models for analytical workloads, ensuring efficient querying and aggregation.

πŸ“š Official Documentation:


Study Resources

Official Training

Certification Information

Key Documentation


Last Updated: February 02 26, 2026 Exam Version: November 2025

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *