Databricks Certified Machine Learning Professional - Comprehensive Resource Guide

A curated collection of demos, official documentation, and training resources mapped to each exam objective for the Databricks Certified Machine Learning Professional certification (September 2025 version).

About the Author
About the Exam
What’s Changed from April 2024
Exam Breakdown & Study Strategy
How to Use This Guide
Key Demos
Section 1: Model Development (~47% of exam)
Section 2: Model Lifecycle Management (MLOps) (~43% of exam)
Section 3: Model Deployment (~10% of exam)
Study Resources

About the Author

I’m a Databricks Solutions Architect Champion with extensive experience in machine learning engineering and MLOps. This guide is designed to help you navigate the ML Professional certification, which is one of the more challenging Databricks certifications.

The ML Professional exam tests your ability to build production-grade ML systems at enterprise scale. This isn’t about knowing ML algorithms – it’s about knowing how to operationalise them using Databricks tools like SparkML, MLflow, Feature Store, Lakehouse Monitoring, and Model Serving.

I created this guide by analysing the exam objectives and mapping them to the best available resources. My advice: don’t just read – practice! Get hands-on with MLflow experiments, build feature pipelines, deploy models, and set up monitoring. The exam questions are scenario-based, so you need practical experience.

Use this guide to get a big picture view of the space, find some demos and use the links mapped to each item on the exam to fill any gaps in your knowledge. I have previously written on my approach to taking certs here. Good luck on your certification journey!

About the Exam

Exam Name: Databricks Certified Machine Learning Professional
Version: September 2025
Questions: 59 scored multiple-choice
Time Limit: 120 minutes
Registration Fee: USD $200
Validity: 2 years
Prerequisite: None required; 1+ year hands-on experience highly recommended

Recommended Preparation

Instructor-led: Machine Learning at Scale and Advanced Machine Learning Operations
Self-paced: Available in Databricks Academy
Working knowledge of Python, scikit-learn, SparkML, and MLflow
Working knowledge of Lakehouse Monitoring and Databricks Model Serving

What’s Changed from the April 2024 Syllabus

The September 2025 exam brings significant updates reflecting Databricks’ evolving ML platform:

Structural Changes:

Consolidated from 4 sections to 3 (Experimentation & Feature Engineering merged with Data Management under Section 1: Model Development)

New Topics Added:

Distributed Hyperparameter Tuning: Ray and Optuna now explicitly covered for scaling hyperparameter searches
Databricks Asset Bundles (DABs): New emphasis on infrastructure-as-code for ML asset management
ML Pipeline Testing: Dedicated coverage of unit tests and integration tests for ML pipelines
Deployment Strategies: Blue-green and canary deployments now explicitly tested

Updated Focus Areas:

Lakehouse Monitoring: Replaces generic drift monitoring with platform-specific capabilities
Unity Catalog Integration: Model aliases and lineage replace legacy Model Registry Webhooks
Feature Serving: Expanded coverage including on-demand feature computation and real-time serving

Study Tip: If you studied for the April 2024 exam, focus extra time on DABs, Ray/Optuna distributed tuning, and Lakehouse Monitoring – these represent the most significant additions.

Exam Breakdown & Study Strategy

Exam Weight by Topic Area (Estimated)

Based on the number of objectives per section:

Section	Topics	Objectives	Est. Weight
Section 1: Model Development	SparkML, Scaling, MLflow, Feature Store	22	~47%
Section 2: MLOps	Lifecycle, Testing, Environments, Monitoring	20	~43%
Section 3: Model Deployment	Strategies, Custom Serving	5	~10%

Subsection Breakdown

Subsection	Objectives	Focus Areas
Using SparkML	7	Pipelines, estimators, transformers, evaluation
Scaling and Tuning	7	Distributed training, Optuna, Ray, parallelization
Advanced MLflow	3	Nested runs, custom logging, PyFunc models
Feature Store	5	Point-in-time, online tables, streaming features
Drift & Monitoring	10	Lakehouse Monitoring, drift detection, alerting
Validation Testing	4	Unit tests, integration tests, ML pipelines
Environment & Lifecycle	4	DABs, model registry, environment transitions
Deployment	5	Blue-green, canary, Model Serving, REST API

How to Use This Guide Effectively

📚 Official Documentation (docs.databricks.com)

The docs are your reference for exact syntax and configuration options. For this exam:

Focus on MLflow, Feature Store, and Lakehouse Monitoring docs
Pay attention to code examples – the exam has code-based questions
Understand the difference between MLflow tracking vs. registry vs. deployments

Best for: API syntax, configuration options, code patterns

🎯 Interactive Demos (databricks.com/resources/demos)

Demos help you see features in action. For ML Pro:

Watch Model Serving demos to understand endpoint configuration
Follow Lakehouse Monitoring tutorials for drift detection setup
DABs demos show infrastructure-as-code patterns

How I use demos:

Read the objective first
Watch the demo focusing on that specific feature
Recreate it in your own workspace

Best for: Understanding workflows, UI navigation, real configurations

🎓 Training Courses (Databricks Academy)

The official courses are highly recommended:

Machine Learning at Scale: SparkML, distributed training, pandas UDFs
Advanced Machine Learning Operations: MLflow, Feature Store, Model Serving, Monitoring

Best for: Structured learning, hands-on labs, comprehensive coverage

📖 Background Reading

Before diving into specifics, these resources provide essential big-picture context:

The Big Book of MLOps – Get the PDF
Start here! This gives you the conceptual foundation for MLOps workflows. Understanding the big picture first makes mapping concepts to specifics much easier as you study.

MLflow Documentation – Essential reading for ~30% of the exam:

Experiment Tracking – Core concept, understand runs, metrics, artifacts
Model Flavors & PyFunc – Know when to use pyfunc for custom models
Model Registry – Lifecycle management concepts

MLOps Automation – Understanding model lifecycle automation:

Model Registry Webhooks – While the September 2025 exam focuses on Unity Catalog model aliases, understanding webhook concepts helps with MLOps automation patterns

Understanding Drift – Critical for Lakehouse Monitoring section:

Introduction to ML Drift – Types of drift explained
Know the four drift types: Feature drift, Label drift, Prediction drift, Concept drift
Know which statistical tests to use: Kolmogorov-Smirnov (numerical), Chi-squared (categorical), Jensen-Shannon divergence (distributions)

Feature Engineering:

Real-time Feature Computation Best Practices – Batch vs streaming trade-offs

My Recommended Study Path

Phase 1: Foundation (Model Development)

Start with SparkML basics – pipelines, estimators, transformers
Learn distributed training with pandas UDFs and Optuna
Master MLflow – experiments, nested runs, custom logging
Understand Feature Store – point-in-time correctness is critical

Phase 2: Operations (MLOps)

Study Lakehouse Monitoring deeply – 10 objectives here!
Learn DABs for ML asset management
Understand testing strategies for ML systems
Practice automated retraining workflows

Phase 3: Deployment

Learn deployment strategies (blue-green, canary)
Practice Model Serving – endpoints, traffic routing
Understand PyFunc models and custom serving

Key Topics That Often Appear

Based on the objective distribution, focus extra attention on:

Lakehouse Monitoring (10 objectives) – drift detection, inference tables, alerting
SparkML (7 objectives) – when to use it, pipeline construction, evaluation
Distributed Training (7 objectives) – Optuna, Ray, pandas UDFs, parallelization
Feature Store (5 objectives) – point-in-time, online tables, feature serving

Practice Environment

Get hands-on experience with:

Databricks Free Edition – Free, no credit card required
MLOps Stacks – Production ML templates

Key Demos to Work Through

These demos provide comprehensive coverage of multiple exam objectives. Work through these first to get the big picture:

Must-Do: End-to-End MLOps

MLOps End-to-End Pipeline (Hands-on Tutorial) – Work through the full ML lifecycle from feature engineering to deployment. Essential hands-on practice for understanding how all the pieces fit together.

Model Development & Training

Data Science and Machine Learning on Databricks – Platform overview for ML
Databricks Machine Learning Workspace – ML workspace features
How to Use AutoML to Develop ML Models – AutoML capabilities
Databricks AutoML – Interactive AutoML tour

Feature Store & Feature Engineering

Feature Store and Online Inference – Feature pipelines and online serving

Model Serving & Deployment

Model Serving on the Lakehouse – Deployment fundamentals
Model Serving Databricks – Interactive serving tour

Monitoring & Drift Detection

Lakehouse Monitoring and Vector Search – Monitoring overview
Lakehouse Monitoring Databricks – Interactive monitoring tour
Monitor Your Data Quality with Lakehouse Monitoring – Hands-on monitoring tutorial

Section 1: Model Development

Using Spark ML

1.1.1 Identify when SparkML is recommended based on the data, model, and use case requirements

Keywords: sparkml, spark ml, mllib, distributed ml, large dataset, big data, when to use spark

📚 Documentation:

SparkML Pipelines

1.1.2 Construct an ML pipeline using SparkML

Keywords: sparkml pipeline, ml pipeline, spark pipeline, pyspark ml, pipeline stages

📚 Documentation:

SparkML Pipelines

1.1.3 Apply the appropriate estimator and/or transformer given a use case

Keywords: estimator, transformer, stringindexer, onehotencoder, vectorassembler, feature transformer, spark estimator

📚 Documentation:

1.1.4 Tune a SparkML model using MLlib

Keywords: hyperparameter tuning, crossvalidator, paramgrid, trainvalidationsplit, spark tuning, mllib tuning

📚 Documentation:

1.1.5 Evaluate a SparkML model

Keywords: model evaluation, evaluator, binaryclassificationevaluator, multiclassclassificationevaluator, regressionevaluator, metrics

📚 Documentation:

Model Evaluation

1.1.6 Score a Spark ML model for a batch or streaming use case

Keywords: batch scoring, batch inference, streaming inference, model scoring, spark transform, predict

🎯 Relevant Demos:

Iot And Predictive Maintenance (Tutorial)

📚 Documentation:

Model Inference

1.1.7 Select SparkML model or single node model for an inference based on type: batch, real-time, streaming

Keywords: inference type, batch vs realtime, model selection, single node, distributed inference, streaming ml

📚 Documentation:

Model Inference

Scaling and Tuning

1.2.1 Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs

Keywords: distributed training, pandas udf, pandas function api, applyinpandas, mapinpandas, scale training

📚 Documentation:

Pandas Function APIs

1.2.2 Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow

Keywords: optuna, hyperparameter optimization, distributed tuning, mlflow optuna, hyperopt, mlflowsparkstudy

📚 Documentation:

1.2.3 Perform distributed hyperparameter tuning using Ray

Keywords: ray, ray tune, distributed hyperparameter, ray on spark, ray cluster

📚 Documentation:

1.2.4 Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments

Keywords: vertical scaling, horizontal scaling, scale up, scale out, cluster sizing, ml workload scaling

📚 Documentation:

Cluster Configuration

1.2.5 Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training

Keywords: model parallelism, data parallelism, distributed deep learning, parallel training, large scale training

📚 Documentation:

Distributed Training

1.2.6 Compare Ray and Spark for distributing ML training workloads

Keywords: ray vs spark, distributed ml framework, training framework comparison, ray spark

📚 Documentation:

Ray on Databricks

1.2.7 Use the Pandas Function API to parallelize group-specific model training and perform inference

Keywords: pandas function api, group-specific training, applyinpandas, grouped map, parallel inference

📚 Documentation:

Pandas Function APIs

Advanced MLflow Usage

1.3.1 Utilize nested runs using MLflow for tracking complex experiments

Keywords: nested runs, mlflow nested, parent run, child run, experiment tracking, mlflow runs

📚 Documentation:

MLflow Tracking

1.3.2 Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows

Keywords: mlflow log, custom metrics, log_metric, log_param, log_artifact, mlflow tracking

📚 Documentation:

MLflow Tracking

1.3.3 Create custom model objects using real-time feature engineering

Keywords: custom model, pyfunc, mlflow pyfunc, custom pyfunc, real-time features, model wrapper

📚 Documentation:

Advanced Feature Store Concepts

1.4.1 Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference

Keywords: point-in-time, feature lookup, data leakage, temporal correctness, feature store lookup, time travel features

📚 Documentation:

Feature Engineering

1.4.2 Build automated pipelines for feature computation using the FeatureEngineering Client

Keywords: feature engineering client, feature computation, feature pipeline, automated features, databricks feature store

📚 Documentation:

Feature Engineering

1.4.3 Configure online tables for low-latency applications using Databricks SDK

Keywords: online tables, online store, low latency, feature serving, online feature, databricks sdk

📚 Documentation:

1.4.4 Design scalable solutions for ingesting and processing streaming data to generate features in real time

Keywords: streaming features, real-time features, streaming ingestion, feature generation, streaming pipeline

📚 Documentation:

Feature Engineering

1.4.5 Develop on-demand features using feature serving for consistent use across training and production environments

Keywords: on-demand features, feature serving, feature function, training serving consistency, feature consistency

📚 Documentation:

Section 2: MLOps

Model Lifecycle Management

2.1.1 Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy

Keywords: model lifecycle, deploy code, environment transition, dev staging prod, ml pipeline architecture, mlops pipeline

📚 Documentation:

2.1.2 Map Databricks features to activities of the model lifecycle management process

Keywords: model registry, unity catalog models, model versioning, model alias, model lifecycle, registered model

📚 Documentation:

Unity Catalog Models

Validation Testing

2.2.1 Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs

Keywords: unit test, pytest, notebook testing, function testing, test databricks, unit testing ml

🎯 Relevant Demos:

Unit Testing Delta Live Table For Production Grade Pipelines (Tutorial)

📚 Documentation:

Testing on Databricks

2.2.2 Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.)

Keywords: integration test, test types, dev test prod, environment testing, ml testing strategy

📚 Documentation:

Testing on Databricks

2.2.3 Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference

Keywords: ml integration test, pipeline testing, end-to-end test, ml system test, inference test

📚 Documentation:

Testing on Databricks

2.2.4 Compare the benefits and challenges of approaches for organizing functions and unit tests

Keywords: test organization, test structure, testing best practices, ml testing patterns

📚 Documentation:

Testing on Databricks

Environment Architectures

2.3.1 Design and implement scalable Databricks environments for machine learning projects using best practices

Keywords: ml environment, workspace architecture, ml best practices, databricks environment, ml infrastructure

📚 Documentation:

ML Best Practices

2.3.2 Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models

Keywords: databricks asset bundles, dabs, asset bundle, ml assets, infrastructure as code, bundle deploy

🎯 Relevant Demos:

Databricks Asset Bundles (Tours)

📚 Documentation:

Automated Retraining

2.4.1 Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts

Keywords: automated retraining, retrain trigger, drift retraining, performance degradation, retraining workflow

📚 Documentation:

Lakehouse Monitoring

2.4.2 Develop a strategy for selecting top-performing models during automated retraining

Keywords: model selection, champion challenger, model comparison, best model, retraining strategy

📚 Documentation:

MLOps Workflows

Drift Detection and Lakehouse Monitoring

2.5.1 Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes

Keywords: drift detection, statistical test, ks test, chi-square, drift metrics, lakehouse monitoring drift

📚 Documentation:

Lakehouse Monitoring

2.5.2 Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why

Keywords: lakehouse monitoring, monitoring feature, table type, snapshot table, time series table

🎯 Relevant Demos:

Lakehouse Monitoring And Vector Search (Video Tour)
Lakehouse Monitoring Databricks (Tours)
Monitor Your Data Quality With Lakehouse Monitoring (Tutorial)

📚 Documentation:

2.5.3 Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring

Keywords: create monitor, snapshot monitor, time series monitor, inference table monitor, lakehouse monitoring setup

📚 Documentation:

Lakehouse Monitoring

2.5.4 Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc.

Keywords: monitoring pipeline, model logging, model health, performance monitoring, ml monitoring components

📚 Documentation:

Lakehouse Monitoring

2.5.5 Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds

Keywords: drift alerting, monitoring alerts, threshold alert, notification, drift threshold

📚 Documentation:

Lakehouse Monitoring

2.5.6 Detect data drift by comparing current data distributions to a known baseline or between successive time windows

Keywords: data drift, baseline comparison, distribution shift, time window drift, drift baseline

📚 Documentation:

Lakehouse Monitoring

2.5.7 Evaluate model performance trends over time using an inference table

Keywords: inference table, performance trend, model performance over time, inference logging, prediction logging

📚 Documentation:

Lakehouse Monitoring

2.5.8 Define custom metrics in Lakehouse Monitoring metrics tables

Keywords: custom metrics, monitoring metrics, metrics table, custom monitoring, define metrics

📚 Documentation:

Lakehouse Monitoring

2.5.9 Evaluate metrics based on different data granularities and feature slicing

Keywords: feature slicing, data granularity, segment analysis, slice metrics, cohort analysis

📚 Documentation:

Feature Engineering

2.5.10 Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage

Keywords: endpoint health, latency monitoring, request rate, error rate, infrastructure metrics, endpoint metrics

📚 Documentation:

Section 3: Model Deployment

Deployment Strategies

3.1.1 Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications

Keywords: blue-green deployment, canary deployment, deployment strategy, traffic routing, rollout strategy, a/b deployment

📚 Documentation:

3.1.2 Implement a model rollout strategy using Databricks Model Serving

Keywords: model rollout, model serving, traffic split, served entity, endpoint routing, gradual rollout

🎯 Relevant Demos:

Model Serving On The Lakehouse (Video Tour)
Model Serving Databricks (Tours)

📚 Documentation:

Custom Model Serving

3.2.1 Register a custom PyFunc model and log custom artifacts in Unity Catalog

Keywords: pyfunc model, custom pyfunc, unity catalog model, register model, custom artifacts, mlflow pyfunc

📚 Documentation:

3.2.2 Query custom models via REST API or MLflow Deployments SDK

Keywords: rest api, mlflow deployments, model query, serving endpoint api, predict api, model inference api

📚 Documentation:

3.2.3 Deploy custom model objects using MLflow deployments SDK, REST API or user interface

Keywords: deploy model, mlflow deploy, serving endpoint, model deployment, endpoint deployment, databricks model serving

📚 Documentation:

Study Resources

Official Training

Machine Learning at Scale (Databricks Academy)
Advanced Machine Learning Operations (Databricks Academy)

Certification Information

ML Professional Exam Page
Databricks Free Edition – Practice for free

Key GitHub Repositories

mlops-stacks – Production ML best practices
databricks-ml-examples – ML code examples

Last Updated: December 2025
Exam Version: September 2025

Databricks Certified Machine Learning Professional – Comprehensive Resource Guide

Table of Contents

About the Author

About the Exam

Recommended Preparation

What’s Changed from the April 2024 Syllabus

Exam Breakdown & Study Strategy

Exam Weight by Topic Area (Estimated)

Subsection Breakdown

How to Use This Guide Effectively

📚 Official Documentation (docs.databricks.com)

🎯 Interactive Demos (databricks.com/resources/demos)

🎓 Training Courses (Databricks Academy)

📖 Background Reading

My Recommended Study Path

Phase 1: Foundation (Model Development)

Phase 2: Operations (MLOps)

Phase 3: Deployment

Key Topics That Often Appear

Practice Environment

Key Demos to Work Through

Must-Do: End-to-End MLOps

Model Development & Training

Feature Store & Feature Engineering

Model Serving & Deployment

Monitoring & Drift Detection

Section 1: Model Development

Using Spark ML

1.1.1 Identify when SparkML is recommended based on the data, model, and use case requirements

1.1.2 Construct an ML pipeline using SparkML

1.1.3 Apply the appropriate estimator and/or transformer given a use case

1.1.4 Tune a SparkML model using MLlib

1.1.5 Evaluate a SparkML model

1.1.6 Score a Spark ML model for a batch or streaming use case

1.1.7 Select SparkML model or single node model for an inference based on type: batch, real-time, streaming

Scaling and Tuning

1.2.1 Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs

1.2.2 Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow

1.2.3 Perform distributed hyperparameter tuning using Ray

1.2.4 Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments

1.2.5 Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training

1.2.6 Compare Ray and Spark for distributing ML training workloads

1.2.7 Use the Pandas Function API to parallelize group-specific model training and perform inference

Advanced MLflow Usage

1.3.1 Utilize nested runs using MLflow for tracking complex experiments

1.3.2 Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows

1.3.3 Create custom model objects using real-time feature engineering

Advanced Feature Store Concepts

1.4.1 Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference

1.4.2 Build automated pipelines for feature computation using the FeatureEngineering Client

1.4.3 Configure online tables for low-latency applications using Databricks SDK

1.4.4 Design scalable solutions for ingesting and processing streaming data to generate features in real time

1.4.5 Develop on-demand features using feature serving for consistent use across training and production environments

Section 2: MLOps

Model Lifecycle Management

2.1.1 Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy

2.1.2 Map Databricks features to activities of the model lifecycle management process

Validation Testing

2.2.1 Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs

2.2.2 Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.)

2.2.3 Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference

2.2.4 Compare the benefits and challenges of approaches for organizing functions and unit tests

Environment Architectures

2.3.1 Design and implement scalable Databricks environments for machine learning projects using best practices

2.3.2 Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models

Automated Retraining

2.4.1 Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts

2.4.2 Develop a strategy for selecting top-performing models during automated retraining

Drift Detection and Lakehouse Monitoring

2.5.1 Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes

2.5.2 Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why

2.5.3 Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring

2.5.4 Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc.

2.5.5 Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds

2.5.6 Detect data drift by comparing current data distributions to a known baseline or between successive time windows

2.5.7 Evaluate model performance trends over time using an inference table

2.5.8 Define custom metrics in Lakehouse Monitoring metrics tables

2.5.9 Evaluate metrics based on different data granularities and feature slicing

2.5.10 Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage

Section 3: Model Deployment