A curated collection of demos, blog posts, official documentation, and training resources mapped to each exam objective for the Databricks Certified Data Engineer Professional certification (November 2025 version). This supercedes my previous blog post series for the last exam syllabus.
How to Use This Guide
For each exam section and objective, this guide provides:
- π Official Documentation: Direct links to official Databricks docs (docs.databricks.com)
- π― Demos: Interactive demonstrations and tutorials
- βοΈ Blog Posts: Technical articles and best practices
- π Training Resources: Courses, certifications, and learning materials
Resources are ranked by relevance score based on keyword matching. Review multiple resources for each objective to get comprehensive coverage.
About the Author
Iβm a Databricks Solutions Architect Champion with extensive experience in data engineering and lakehouse architecture. This guide is designed to help you navigate the Data Engineer Professional certification, which is one of the most challenging Databricks certifications.
The Data Engineer Professional exam tests your ability to build production-grade, enterprise-scale data engineering solutions. This goes beyond basic pipeline development – you need to demonstrate expertise in optimization, security, governance, monitoring, and deployment automation.
I created this guide by analyzing the exam objectives and mapping them to the best available resources. My advice: this exam requires hands-on experience. You should have built real production pipelines, optimized queries, implemented security controls, and deployed with DABs before attempting this exam.
Find out what works best for you. Good luck on your Databricks certification journey!
About the Exam
- Exam Name: Databricks Certified Data Engineer Professional
- Exam Date: November 30, 2025
- Questions: 59 scored multiple-choice
- Time Limit: 120 minutes
- Registration Fee: USD $200
- Validity: 2 years
- Prerequisite: Data Engineer Associate recommended; 2+ years hands-on experience strongly recommended
Recommended Preparation
- Instructor-led: Advanced Data Engineering with Databricks
- Self-paced: Available in Databricks Academy
- Deep working knowledge of Python, SQL, Spark, and Delta Lake
- Production experience with Lakeflow/DLT, Unity Catalog, and DABs

π Whatβs Changed from the April 2024 Syllabus
The November 2025 exam brings significant updates reflecting Databricksβ evolving data engineering platform:
Structural Changes:
- Expanded from 6 sections to 10 sections for more granular coverage
- Old βDatabricks Toolingβ section content now distributed across multiple new sections
- Old βData Processingβ section split into Ingestion, Transformation, and Data Modelling sections
New Topics Added:
- Databricks Asset Bundles (DABs): Major new focus on infrastructure-as-code deployment (Section 1 & 9)
- Lakeflow Declarative Pipelines: Replaces Delta Live Tables terminology throughout
- APPLY CHANGES API: Explicit coverage for CDC processing in DLT (Section 1)
- Liquid Clustering: Replaces Z-Order and partitioning for data layout optimization (Section 10)
- System Tables: New emphasis on observability and cost monitoring (Section 5)
- Row Filters & Column Masks: Data security now has dedicated coverage (Section 7)
Removed/Reduced Topics:
- Partition hints (coalesce, repartition, repartition_by_range, rebalance): Less emphasis in new syllabus
- Z-Order indexing: Replaced by Liquid Clustering focus
- Bloom filters: Reduced coverage
- Manual file size control: Auto-optimization now preferred
Renamed/Evolved Topics:
- βDelta Live Tablesβ β βLakeflow Declarative Pipelinesβ or βLakeflow Spark Declarative Pipelinesβ
- βStreaming with Delta Lakeβ β Focus now on Streaming Tables vs Materialized Views
- Traditional partitioning strategies β Liquid Clustering
Study Tip: If you studied for the April 2024 exam, focus extra time on:
- DABs project structure and deployment
- Lakeflow/DLT new syntax and APPLY CHANGES API
- Liquid Clustering (replaces partitioning/Z-Order decisions)
- System Tables for monitoring and observability
- Row filters and column masks for security
π Background Reading
Before diving into the objectives, these resources provide essential foundational context:
Delta Lake Fundamentals
Understanding the Delta Lake Transaction Log – Read the Blog Post This excellent post provides deep insight into how Delta Lake guarantees ACID transactions. Understanding the transaction log is fundamental to troubleshooting and optimization questions on the exam.
Delta Lake 3.0 and Liquid Clustering – Read the Blog Post Liquid Clustering is the new approach to data layout optimization, replacing partitioning and Z-Order strategies. This is a must-read for the new exam.
How to Clone Delta Lake Tables – Read the Blog Post Covers both shallow and deep clones with use cases – useful for understanding testing and data sharing scenarios.
Streaming & CDC
Simplifying CDC with Change Data Feed – Read the Blog Post Essential reading for understanding how CDF enables incremental processing and delete propagation.
Simplifying Streaming Data Ingestion into Delta Lake – Read the Blog Post Great overview of Auto Loader and streaming patterns.
Stream-Stream Joins in Apache Spark – Read the Blog Post While focusing on Spark 2.3, the concepts of stream-stream joins are still relevant for the exam.
Performance Optimization
Processing Petabytes with Databricks Delta – Read the Blog Post Contains excellent visualizations for understanding Z-Ordering concepts (though now superseded by Liquid Clustering, the principles still apply).
How Databricks Improved Query Performance with Auto-Optimized File Sizes – Read the Blog Post Understanding the βsmall filesβ problem and how Databricks addresses it automatically.
Data Governance & Compliance
Handling βRight to be Forgottenβ with Delta Live Tables – Read the Blog Post Critical for understanding data purging and GDPR/CCPA compliance – directly relevant to Section 7.
Official Documentation Quick Links
- Delta Lake Documentation – Official Delta Lake site
- Structured Streaming Programming Guide – Essential Spark reference
- Delta Lake Cheatsheet – Quick reference PDF
Free Ebook
Delta Lake: Up & Running by OβReilly – Get the PDF Comprehensive book covering Delta Lake internals, operations, and best practices. Free in exchange for your email.
π Exam Breakdown & Study Strategy
Exam Weight by Section
Understanding how the exam is weighted helps you prioritize your study time. The Professional exam covers 10 sections with 42 total objectives:
| Section | Topics | Objectives | Study Priority |
|---|---|---|---|
| Section 1: Developing Code | DABs, UDFs, DLT, Jobs, Testing | 11 | π΄ Critical |
| Section 5: Monitoring and Alerting | System Tables, Spark UI, APIs | 6 | π΄ Critical |
| Section 6: Cost & Performance | Optimization, CDF, Query Profile | 5 | π΄ Critical |
| Section 7: Security and Compliance | ACLs, Masking, PII, Purging | 5 | π΄ Critical |
| Section 9: Debugging and Deploying | Spark UI, DABs, Git CI/CD | 5 | π‘ High |
| Section 10: Data Modelling | Delta, Liquid Clustering, Dimensional | 4 | π‘ High |
| Section 4: Data Sharing | Delta Sharing, Federation | 3 | π‘ High |
| Section 2: Data Ingestion | Multi-format, Streaming/Batch | 2 | π’ Medium |
| Section 3: Transformation | Window Functions, Quarantining | 2 | π’ Medium |
| Section 8: Data Governance | Metadata, Permissions | 2 | π’ Medium |
π― How to Use This Guide Effectively
Iβve organized resources into four categories for each exam objective. Hereβs how I recommend using them:
π Official Documentation (docs.databricks.com)
This is your primary reference for the Professional exam. You need to know the details – syntax, configuration options, and edge cases.
My approach:
- Read the conceptual overview AND the API/syntax reference
- Understand the βLimitationsβ and βBest Practicesβ sections – exam questions often test edge cases
- Pay special attention to Unity Catalog, DABs, and DLT documentation
Best for: Understanding exact syntax, parameters, and technical specifications
π― Interactive Demos (databricks.com/resources/demos)
For the Professional exam, demos help you understand complex workflows and enterprise patterns.
How I use demos:
- Watch the end-to-end flow first – understand how components connect
- Focus on configuration details – the exam tests specific settings
- Recreate in your workspace – you need hands-on experience
Best for: Understanding complex workflows and enterprise-scale patterns
π Training Resources (Databricks Academy)
The Advanced Data Engineering course is highly recommended for this exam.
Training Courses:
- Advanced Data Engineering with Databricks – Critical for Sections 1, 5, 6, 9
- Data Management and Governance with Unity Catalog – Critical for Sections 7, 8
Best for: Deep dives into complex topics and hands-on labs
My Recommended Study Path
Phase 1: Core Development (Sections 1, 2, 3)
- Master Lakeflow Declarative Pipelines – streaming tables, materialized views, expectations
- Understand APPLY CHANGES API for CDC
- Practice writing and testing UDFs
- Learn DABs project structure and deployment
Phase 2: Operations & Monitoring (Sections 5, 9)
- Deep dive into System Tables for observability
- Master Query Profiler and Spark UI analysis
- Understand event logs for DLT debugging
- Practice job repair and troubleshooting
Phase 3: Optimization (Section 6)
- Learn Delta optimization – deletion vectors, liquid clustering
- Understand data skipping and file pruning
- Master Change Data Feed (CDF) for incremental processing
Phase 4: Security & Governance (Sections 7, 8)
- Implement row filters and column masks
- Understand PII detection and masking strategies
- Learn data purging for compliance
Phase 5: Sharing & Modeling (Sections 4, 10)
- Configure Delta Sharing (D2D and D2O)
- Set up Lakehouse Federation
- Design dimensional models with liquid clustering
Practice & Validation
Hands-On Practice (This is critical for Professional!):
- Sign up for Databricks Free Edition
- Build production-style pipelines with DLT expectations and CDC
- Deploy pipelines using DABs with multiple environments
- Implement row filters and column masks on sensitive data
- Practice debugging with Query Profiler and event logs
- Set up Delta Sharing between workspaces
Key Differentiators from Associate:
- Professional tests optimization and debugging skills heavily
- You need to know βwhen to use whatβ not just βhow to useβ
- Security and governance questions are more scenario-based
- Expect questions about troubleshooting production issues
Section 1: Developing Code for Data Processing using Python and SQL
Section Overview: 11 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
1.1 Scalable Python Project Structure with DABs
Objective: Design and implement a scalable Python project structure optimized for Databricks Asset Bundles (DABs), enabling modular development, deployment automation, and CI/CD integration.
π Official Documentation:
Top Demos:
π Training Resources:
1.2 Managing External Libraries and Dependencies
Objective: Manage and troubleshoot external third-party library installations and dependencies in Databricks, including PyPI packages, local wheels, and source archives.
π Official Documentation:
1.3 Pandas/Python User-Defined Functions (UDFs)
Objective: Develop User-Defined Functions (UDFs) using Pandas/Python UDF.
π Official Documentation:
1.4 Production Data Pipelines with Lakeflow & Auto Loader
Objective: Build and manage reliable, production-ready data pipelines for batch and streaming data using Lakeflow Spark Declarative Pipelines and Autoloader.
π Official Documentation:
Top Demos:
- Streaming Data With Delta Live Tables And Databricks Workflows
- Lakeflow Declarative Pipeline
- Lakeflow Declarative Pipelines
1.5 ETL Workflow Automation with Jobs
Objective: Create and Automate ETL workloads using Jobs via UI/APIs/CLI.
π Official Documentation:
Top Demos:
- Schedule A Job And Automate A Workload
- Pandas Api With Spark Backend
- Querying State Data In Spark Structured Streaming With State Reader Api
1.6 Streaming Tables vs Materialized Views
Objective: Explain the advantages and disadvantages of streaming tables compared to materialized views.
π Official Documentation:
1.7 CDC with APPLY CHANGES API
Objective: Use APPLY CHANGES APIs to simplify CDC in Lakeflow Spark Declarative Pipelines.
π Official Documentation:
Top Demos:
1.8 Structured Streaming vs Lakeflow Pipelines
Objective: Compare Spark Structured Streaming and Lakeflow Spark Declarative Pipelines to determine the optimal approach for building scalable ETL pipelines.
π Official Documentation:
Top Demos:
- Spark Streaming Advanced
- Lakeflow Declarative Pipelines
- Querying State Data In Spark Structured Streaming With State Reader Api
1.9 Control Flow Operators in Pipelines
Objective: Create a pipeline component that uses control flow operators (e.g., if/else, for/each, etc.).
1.10 Environment and Task Configuration
Objective: Choose the appropriate configs for environments and dependencies, high memory for notebook tasks, and auto-optimization to disallow retries.
π Official Documentation:
1.11 Unit and Integration Testing
Objective: Develop unit and integration tests using assertDataFrameEqual, assertSchemaEqual, DataFrame.transform, and testing frameworks, to ensure code correctness, including a built-in debugger.
π Official Documentation:
Section 2: Data Ingestion & Acquisition
Section Overview: 2 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
πΉ Video Demos (In-depth demonstrations):
2.1 Multi-Format Data Ingestion Pipelines
Objective: Design and implement data ingestion pipelines to efficiently ingest a variety of data formats including Delta Lake, Parquet, ORC, AVRO, JSON, CSV, XML, Text and Binary from diverse sources such as message buses and cloud storage.
π Official Documentation:
Top Demos:
- Delta Live Tables With Apache Kafka
- Get Data Into Databricks From Kafka
- Xml Data Ingestion Spark Databricks
π Training Resources:
2.2 Append-Only Batch and Streaming Pipelines
Objective: Create an append-only data pipeline capable of handling both batch and streaming data using Delta.
π Official Documentation:
Top Demos:
- Streaming Data With Delta Live Tables And Databricks Workflows
- Spark Streaming Advanced
- Delta Live Tables Overview
Section 3: Data Transformation, Cleansing, and Quality
Section Overview: 2 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
3.1 Advanced Spark SQL and PySpark Transformations
Objective: Write efficient Spark SQL and PySpark code to apply advanced data transformations, including window functions, joins, and aggregations, to manipulate and analyze large Datasets.
π Official Documentation:
π Training Resources:
3.2 Bad Data Quarantining Process
Objective: Develop a quarantining process for bad data with Lakeflow Spark Declarative Pipelines, or autoloader in classic jobs.
π Official Documentation:
Top Demos:
Section 4: Data Sharing and Federation
Section Overview: 3 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
4.1 Delta Sharing (D2D and D2O)
Objective: Demonstrate delta sharing securely between Databricks deployments using Databricks to Databricks Sharing (D2D) or to external platforms using the open sharing protocol (D2O).
π Official Documentation:
Top Demos:
π Training Resources:
4.2 Lakehouse Federation Configuration
Objective: Configure Lakehouse Federation with proper governance across the supported source Systems.
π Official Documentation:
Top Demos:
4.3 Sharing Live Data with Delta Share
Objective: Use Delta Share to share live data from Lakehouse to any computing platform.
π Official Documentation:
Section 5: Monitoring and Alerting
Section Overview: 6 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
5.1 System Tables for Observability
Objective: Use system tables for observability over resource utilization, cost, auditing and workload monitoring.
π Official Documentation:
Top Demos:
π Training Resources:
5.2 Query Profiler and Spark UI Monitoring
Objective: Use Query Profiler UI and Spark UI to monitor workloads.
π Official Documentation:
Top Demos:
5.3 REST API and CLI for Job Monitoring
Objective: Use the Databricks REST APIs/Databricks CLI for monitoring jobs and pipelines.
π Official Documentation:
Top Demos:
- Pandas Api With Spark Backend
- Querying State Data In Spark Structured Streaming With State Reader Api
5.4 Lakeflow Pipeline Event Logs
Objective: Use Lakeflow Spark Declarative Pipelines Event Logs to monitor pipelines.
π Official Documentation:
Top Demos:
5.5 SQL Alerts for Data Quality
Objective: Use SQL Alerts to monitor data quality.
π Official Documentation:
Top Demos:
5.6 Job Notifications and Alerting
Objective: Use the Lakeflow Jobs UI and Jobs API to set up notifications for job status and performance issues.
π Official Documentation:
Section 6: Cost & Performance Optimisation
Section Overview: 5 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
πΉ Video Demos (In-depth demonstrations):
6.1 Unity Catalog Managed Tables Benefits
Objective: Understand how / why using Unity Catalog managed tables reduces operations Overhead and maintenance burden.
Top Demos:
π Training Resources:
6.2 Delta Optimization Techniques
Objective: Understand delta optimization techniques, such as deletion vectors and liquid clustering.
π Official Documentation:
6.3 Query Optimization Techniques
Objective: Understand the optimization techniques used by Databricks to ensure the performance of queries on large datasets (data skipping, file pruning, etc.).
π Official Documentation:
6.4 Change Data Feed (CDF) for Streaming
Objective: Apply Change Data Feed (CDF) to address specific limitations of streaming tables and enhance latency.
π Official Documentation:
Top Demos:
- Spark Streaming Advanced
- Streaming Data With Delta Live Tables And Databricks Workflows
- Querying State Data In Spark Structured Streaming With State Reader Api
6.5 Query Profile Analysis and Bottlenecks
Objective: Use the query profile to analyze the query and identify bottlenecks, such as bad data skipping, inefficient types of joins, and data shuffling.
π Official Documentation:
Section 7: Ensuring Data Security and Compliance
Section Overview: 5 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
7.1 Workspace ACLs and Least Privilege
Objective: Use ACLs to secure Workspace Objects, enforcing the principle of least privilege, including enforcing principles like least privilege, policy enforcement.
π Official Documentation:
Top Demos:
π Training Resources:
7.2 Row Filters and Column Masks
Objective: Use row filters and column masks to filter and mask sensitive table data.
π Official Documentation:
7.3 Data Anonymization and Pseudonymization
Objective: Apply anonymization and pseudonymization methods, such as Hashing, Tokenization, Suppression, and generalisation, to confidential data.
π Official Documentation:
7.4 PII Detection and Masking Pipelines
Objective: Implement a compliant batch & streaming pipeline that detects and applies masking of PII to ensure data privacy.
π Official Documentation:
7.5 Data Purging and Retention Compliance
Objective: Develop a data purging solution ensuring compliance with data retention policies.
π Official Documentation:
Section 8: Data Governance
Section Overview: 2 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π Hands-On Tutorials (Follow along in your workspace):
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
8.1 Metadata and Data Discoverability
Objective: Create and add descriptions/metadata about enterprise data to make it more discoverable.
π Official Documentation:
Top Demos:
π Training Resources:
8.2 Unity Catalog Permission Inheritance
Objective: Demonstrate understanding of Unity Catalog permission inheritance model.
π Official Documentation:
Top Demos:
Section 9: Debugging and Deploying
Section Overview: 5 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
π₯ Product Tours (Quick 3-5 minute overviews):
πΉ Video Demos (In-depth demonstrations):
9.1 Diagnostic Information and Troubleshooting
Objective: Identify pertinent diagnostic information using Spark UI, cluster logs, system tables, and query profiles to troubleshoot errors.
π Official Documentation:
Top Demos:
π Training Resources:
9.2 Job Repair and Parameter Overrides
Objective: Analyze the errors and remediate the failed job runs with job repairs and parameter overrides.
π Official Documentation:
9.3 Debugging Lakeflow and Spark Pipelines
Objective: Use Lakeflow Spark Declarative Pipelines event logs and the Spark UI to debug Lakeflow Spark Declarative Pipelines and Spark pipelines.
π Official Documentation:
Top Demos:
9.4 Deploying with Databricks Asset Bundles
Objective: Build and deploy Databricks resources using Databricks Asset Bundles.
π Official Documentation:
Top Demos:
9.5 Git-based CI/CD Workflows
Objective: Configure and integrate with Git-based CI/CD workflows using Databricks Git Folders for notebook and code deployment.
π Official Documentation:
Top Demos:
Section 10: Data Modelling
Section Overview: 4 objectives
Recommended Demos for This Section
Start with these demos to get hands-on experience:
πΉ Video Demos (In-depth demonstrations):
10.1 Scalable Data Models with Delta Lake
Objective: Design and implement scalable data models using Delta Lake to manage large datasets.
π Official Documentation:
Top Demos:
π Training Resources:
10.2 Liquid Clustering for Query Performance
Objective: Simplify data layout decisions and optimize query performance using Liquid Clustering.
π Official Documentation:
10.3 Liquid Clustering vs Partitioning/Z-Order
Objective: Identify the benefits of using liquid Clustering over Partitioning and ZOrder.
π Official Documentation:
10.4 Dimensional Modeling for Analytics
Objective: Design Dimensional Models for analytical workloads, ensuring efficient querying and aggregation.
π Official Documentation:
Study Resources
Official Training
- Advanced Data Engineering with Databricks (Databricks Academy) – Highly Recommended
- Data Engineering with Databricks (Databricks Academy)
- Data Management and Governance with Unity Catalog (Databricks Academy)
Certification Information
- Data Engineer Professional Exam Page
- Databricks Free Edition – Practice for free
Key Documentation
- Lakeflow Declarative Pipelines – ETL pipelines with DLT
- APPLY CHANGES API – CDC processing
- Databricks Asset Bundles – CI/CD and deployment
- System Tables – Observability and monitoring
- Query Profiler – Performance analysis
- Row Filters and Column Masks – Data security
- Liquid Clustering – Performance optimization
- Delta Sharing – Data sharing
Last Updated: February 02 26, 2026 Exam Version: November 2025