Databricks Certified Machine Learning Professional – Comprehensive Resource Guide

A curated collection of demos, official documentation, and training resources mapped to each exam objective for the Databricks Certified Machine Learning Professional certification (September 2025 version).

The new Databricks certified Machine Learning Engineer Professional badge

Table of Contents


About the Author

I’m a Databricks Solutions Architect Champion with extensive experience in machine learning engineering and MLOps. This guide is designed to help you navigate the ML Professional certification, which is one of the more challenging Databricks certifications.

The ML Professional exam tests your ability to build production-grade ML systems at enterprise scale. This isn’t about knowing ML algorithms – it’s about knowing how to operationalise them using Databricks tools like SparkML, MLflow, Feature Store, Lakehouse Monitoring, and Model Serving.

I created this guide by analysing the exam objectives and mapping them to the best available resources. My advice: don’t just read – practice! Get hands-on with MLflow experiments, build feature pipelines, deploy models, and set up monitoring. The exam questions are scenario-based, so you need practical experience.

Use this guide to get a big picture view of the space, find some demos and use the links mapped to each item on the exam to fill any gaps in your knowledge. I have previously written on my approach to taking certs here. Good luck on your certification journey!


About the Exam

  • Exam Name: Databricks Certified Machine Learning Professional
  • Version: September 2025
  • Questions: 59 scored multiple-choice
  • Time Limit: 120 minutes
  • Registration Fee: USD $200
  • Validity: 2 years
  • Prerequisite: None required; 1+ year hands-on experience highly recommended

Recommended Preparation

  • Instructor-led: Machine Learning at Scale and Advanced Machine Learning Operations
  • Self-paced: Available in Databricks Academy
  • Working knowledge of Python, scikit-learn, SparkML, and MLflow
  • Working knowledge of Lakehouse Monitoring and Databricks Model Serving

What’s Changed from the April 2024 Syllabus

The September 2025 exam brings significant updates reflecting Databricks’ evolving ML platform:

Structural Changes:

  • Consolidated from 4 sections to 3 (Experimentation & Feature Engineering merged with Data Management under Section 1: Model Development)

New Topics Added:

  • Distributed Hyperparameter Tuning: Ray and Optuna now explicitly covered for scaling hyperparameter searches
  • Databricks Asset Bundles (DABs): New emphasis on infrastructure-as-code for ML asset management
  • ML Pipeline Testing: Dedicated coverage of unit tests and integration tests for ML pipelines
  • Deployment Strategies: Blue-green and canary deployments now explicitly tested

Updated Focus Areas:

  • Lakehouse Monitoring: Replaces generic drift monitoring with platform-specific capabilities
  • Unity Catalog Integration: Model aliases and lineage replace legacy Model Registry Webhooks
  • Feature Serving: Expanded coverage including on-demand feature computation and real-time serving

Study Tip: If you studied for the April 2024 exam, focus extra time on DABs, Ray/Optuna distributed tuning, and Lakehouse Monitoring – these represent the most significant additions.


Exam Breakdown & Study Strategy

Exam Weight by Topic Area (Estimated)

Based on the number of objectives per section:

SectionTopicsObjectivesEst. Weight
Section 1: Model DevelopmentSparkML, Scaling, MLflow, Feature Store22~47%
Section 2: MLOpsLifecycle, Testing, Environments, Monitoring20~43%
Section 3: Model DeploymentStrategies, Custom Serving5~10%

Subsection Breakdown

SubsectionObjectivesFocus Areas
Using SparkML7Pipelines, estimators, transformers, evaluation
Scaling and Tuning7Distributed training, Optuna, Ray, parallelization
Advanced MLflow3Nested runs, custom logging, PyFunc models
Feature Store5Point-in-time, online tables, streaming features
Drift & Monitoring10Lakehouse Monitoring, drift detection, alerting
Validation Testing4Unit tests, integration tests, ML pipelines
Environment & Lifecycle4DABs, model registry, environment transitions
Deployment5Blue-green, canary, Model Serving, REST API

How to Use This Guide Effectively

๐Ÿ“š Official Documentation (docs.databricks.com)

The docs are your reference for exact syntax and configuration options. For this exam:

  • Focus on MLflow, Feature Store, and Lakehouse Monitoring docs
  • Pay attention to code examples – the exam has code-based questions
  • Understand the difference between MLflow tracking vs. registry vs. deployments

Best for: API syntax, configuration options, code patterns


๐ŸŽฏ Interactive Demos (databricks.com/resources/demos)

Demos help you see features in action. For ML Pro:

  • Watch Model Serving demos to understand endpoint configuration
  • Follow Lakehouse Monitoring tutorials for drift detection setup
  • DABs demos show infrastructure-as-code patterns

How I use demos:

  1. Read the objective first
  2. Watch the demo focusing on that specific feature
  3. Recreate it in your own workspace

Best for: Understanding workflows, UI navigation, real configurations


๐ŸŽ“ Training Courses (Databricks Academy)

The official courses are highly recommended:

  • Machine Learning at Scale: SparkML, distributed training, pandas UDFs
  • Advanced Machine Learning Operations: MLflow, Feature Store, Model Serving, Monitoring

Best for: Structured learning, hands-on labs, comprehensive coverage


๐Ÿ“– Background Reading

Before diving into specifics, these resources provide essential big-picture context:

The Big Book of MLOpsGet the PDF
Start here! This gives you the conceptual foundation for MLOps workflows. Understanding the big picture first makes mapping concepts to specifics much easier as you study.

MLflow Documentation – Essential reading for ~30% of the exam:

MLOps Automation – Understanding model lifecycle automation:

  • Model Registry Webhooks – While the September 2025 exam focuses on Unity Catalog model aliases, understanding webhook concepts helps with MLOps automation patterns

Understanding Drift – Critical for Lakehouse Monitoring section:

  • Introduction to ML Drift – Types of drift explained
  • Know the four drift types: Feature drift, Label drift, Prediction drift, Concept drift
  • Know which statistical tests to use: Kolmogorov-Smirnov (numerical), Chi-squared (categorical), Jensen-Shannon divergence (distributions)

Feature Engineering:


My Recommended Study Path

Phase 1: Foundation (Model Development)

  1. Start with SparkML basics – pipelines, estimators, transformers
  2. Learn distributed training with pandas UDFs and Optuna
  3. Master MLflow – experiments, nested runs, custom logging
  4. Understand Feature Store – point-in-time correctness is critical

Phase 2: Operations (MLOps)

  1. Study Lakehouse Monitoring deeply – 10 objectives here!
  2. Learn DABs for ML asset management
  3. Understand testing strategies for ML systems
  4. Practice automated retraining workflows

Phase 3: Deployment

  1. Learn deployment strategies (blue-green, canary)
  2. Practice Model Serving – endpoints, traffic routing
  3. Understand PyFunc models and custom serving

Key Topics That Often Appear

Based on the objective distribution, focus extra attention on:

  1. Lakehouse Monitoring (10 objectives) – drift detection, inference tables, alerting
  2. SparkML (7 objectives) – when to use it, pipeline construction, evaluation
  3. Distributed Training (7 objectives) – Optuna, Ray, pandas UDFs, parallelization
  4. Feature Store (5 objectives) – point-in-time, online tables, feature serving

Practice Environment

Get hands-on experience with:


Key Demos to Work Through

These demos provide comprehensive coverage of multiple exam objectives. Work through these first to get the big picture:

Must-Do: End-to-End MLOps

  • MLOps End-to-End Pipeline (Hands-on Tutorial) – Work through the full ML lifecycle from feature engineering to deployment. Essential hands-on practice for understanding how all the pieces fit together.

Model Development & Training

Feature Store & Feature Engineering

Model Serving & Deployment

Monitoring & Drift Detection


Section 1: Model Development

Using Spark ML

1.1.1 Identify when SparkML is recommended based on the data, model, and use case requirements

Keywords: sparkml, spark ml, mllib, distributed ml, large dataset, big data, when to use spark

๐Ÿ“š Documentation:


1.1.2 Construct an ML pipeline using SparkML

Keywords: sparkml pipeline, ml pipeline, spark pipeline, pyspark ml, pipeline stages

๐Ÿ“š Documentation:


1.1.3 Apply the appropriate estimator and/or transformer given a use case

Keywords: estimator, transformer, stringindexer, onehotencoder, vectorassembler, feature transformer, spark estimator

๐Ÿ“š Documentation:


1.1.4 Tune a SparkML model using MLlib

Keywords: hyperparameter tuning, crossvalidator, paramgrid, trainvalidationsplit, spark tuning, mllib tuning

๐Ÿ“š Documentation:


1.1.5 Evaluate a SparkML model

Keywords: model evaluation, evaluator, binaryclassificationevaluator, multiclassclassificationevaluator, regressionevaluator, metrics

๐Ÿ“š Documentation:


1.1.6 Score a Spark ML model for a batch or streaming use case

Keywords: batch scoring, batch inference, streaming inference, model scoring, spark transform, predict

๐ŸŽฏ Relevant Demos:

๐Ÿ“š Documentation:


1.1.7 Select SparkML model or single node model for an inference based on type: batch, real-time, streaming

Keywords: inference type, batch vs realtime, model selection, single node, distributed inference, streaming ml

๐Ÿ“š Documentation:


Scaling and Tuning

1.2.1 Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs

Keywords: distributed training, pandas udf, pandas function api, applyinpandas, mapinpandas, scale training

๐Ÿ“š Documentation:


1.2.2 Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow

Keywords: optuna, hyperparameter optimization, distributed tuning, mlflow optuna, hyperopt, mlflowsparkstudy

๐Ÿ“š Documentation:


1.2.3 Perform distributed hyperparameter tuning using Ray

Keywords: ray, ray tune, distributed hyperparameter, ray on spark, ray cluster

๐Ÿ“š Documentation:


1.2.4 Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments

Keywords: vertical scaling, horizontal scaling, scale up, scale out, cluster sizing, ml workload scaling

๐Ÿ“š Documentation:


1.2.5 Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training

Keywords: model parallelism, data parallelism, distributed deep learning, parallel training, large scale training

๐Ÿ“š Documentation:


1.2.6 Compare Ray and Spark for distributing ML training workloads

Keywords: ray vs spark, distributed ml framework, training framework comparison, ray spark

๐Ÿ“š Documentation:


1.2.7 Use the Pandas Function API to parallelize group-specific model training and perform inference

Keywords: pandas function api, group-specific training, applyinpandas, grouped map, parallel inference

๐Ÿ“š Documentation:


Advanced MLflow Usage

1.3.1 Utilize nested runs using MLflow for tracking complex experiments

Keywords: nested runs, mlflow nested, parent run, child run, experiment tracking, mlflow runs

๐Ÿ“š Documentation:


1.3.2 Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows

Keywords: mlflow log, custom metrics, log_metric, log_param, log_artifact, mlflow tracking

๐Ÿ“š Documentation:


1.3.3 Create custom model objects using real-time feature engineering

Keywords: custom model, pyfunc, mlflow pyfunc, custom pyfunc, real-time features, model wrapper

๐Ÿ“š Documentation:


Advanced Feature Store Concepts

1.4.1 Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference

Keywords: point-in-time, feature lookup, data leakage, temporal correctness, feature store lookup, time travel features

๐Ÿ“š Documentation:


1.4.2 Build automated pipelines for feature computation using the FeatureEngineering Client

Keywords: feature engineering client, feature computation, feature pipeline, automated features, databricks feature store

๐Ÿ“š Documentation:


1.4.3 Configure online tables for low-latency applications using Databricks SDK

Keywords: online tables, online store, low latency, feature serving, online feature, databricks sdk

๐Ÿ“š Documentation:


1.4.4 Design scalable solutions for ingesting and processing streaming data to generate features in real time

Keywords: streaming features, real-time features, streaming ingestion, feature generation, streaming pipeline

๐Ÿ“š Documentation:


1.4.5 Develop on-demand features using feature serving for consistent use across training and production environments

Keywords: on-demand features, feature serving, feature function, training serving consistency, feature consistency

๐Ÿ“š Documentation:


Section 2: MLOps

Model Lifecycle Management

2.1.1 Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy

Keywords: model lifecycle, deploy code, environment transition, dev staging prod, ml pipeline architecture, mlops pipeline

๐Ÿ“š Documentation:


2.1.2 Map Databricks features to activities of the model lifecycle management process

Keywords: model registry, unity catalog models, model versioning, model alias, model lifecycle, registered model

๐Ÿ“š Documentation:


Validation Testing

2.2.1 Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs

Keywords: unit test, pytest, notebook testing, function testing, test databricks, unit testing ml

๐ŸŽฏ Relevant Demos:

๐Ÿ“š Documentation:


2.2.2 Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.)

Keywords: integration test, test types, dev test prod, environment testing, ml testing strategy

๐Ÿ“š Documentation:


2.2.3 Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference

Keywords: ml integration test, pipeline testing, end-to-end test, ml system test, inference test

๐Ÿ“š Documentation:


2.2.4 Compare the benefits and challenges of approaches for organizing functions and unit tests

Keywords: test organization, test structure, testing best practices, ml testing patterns

๐Ÿ“š Documentation:


Environment Architectures

2.3.1 Design and implement scalable Databricks environments for machine learning projects using best practices

Keywords: ml environment, workspace architecture, ml best practices, databricks environment, ml infrastructure

๐Ÿ“š Documentation:


2.3.2 Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models

Keywords: databricks asset bundles, dabs, asset bundle, ml assets, infrastructure as code, bundle deploy

๐ŸŽฏ Relevant Demos:

๐Ÿ“š Documentation:


Automated Retraining

2.4.1 Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts

Keywords: automated retraining, retrain trigger, drift retraining, performance degradation, retraining workflow

๐Ÿ“š Documentation:


2.4.2 Develop a strategy for selecting top-performing models during automated retraining

Keywords: model selection, champion challenger, model comparison, best model, retraining strategy

๐Ÿ“š Documentation:


Drift Detection and Lakehouse Monitoring

2.5.1 Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes

Keywords: drift detection, statistical test, ks test, chi-square, drift metrics, lakehouse monitoring drift

๐Ÿ“š Documentation:


2.5.2 Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why

Keywords: lakehouse monitoring, monitoring feature, table type, snapshot table, time series table

๐ŸŽฏ Relevant Demos:

๐Ÿ“š Documentation:


2.5.3 Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring

Keywords: create monitor, snapshot monitor, time series monitor, inference table monitor, lakehouse monitoring setup

๐Ÿ“š Documentation:


2.5.4 Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc.

Keywords: monitoring pipeline, model logging, model health, performance monitoring, ml monitoring components

๐Ÿ“š Documentation:


2.5.5 Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds

Keywords: drift alerting, monitoring alerts, threshold alert, notification, drift threshold

๐Ÿ“š Documentation:


2.5.6 Detect data drift by comparing current data distributions to a known baseline or between successive time windows

Keywords: data drift, baseline comparison, distribution shift, time window drift, drift baseline

๐Ÿ“š Documentation:


2.5.7 Evaluate model performance trends over time using an inference table

Keywords: inference table, performance trend, model performance over time, inference logging, prediction logging

๐Ÿ“š Documentation:


2.5.8 Define custom metrics in Lakehouse Monitoring metrics tables

Keywords: custom metrics, monitoring metrics, metrics table, custom monitoring, define metrics

๐Ÿ“š Documentation:


2.5.9 Evaluate metrics based on different data granularities and feature slicing

Keywords: feature slicing, data granularity, segment analysis, slice metrics, cohort analysis

๐Ÿ“š Documentation:


2.5.10 Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage

Keywords: endpoint health, latency monitoring, request rate, error rate, infrastructure metrics, endpoint metrics

๐Ÿ“š Documentation:


Section 3: Model Deployment

Deployment Strategies

3.1.1 Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications

Keywords: blue-green deployment, canary deployment, deployment strategy, traffic routing, rollout strategy, a/b deployment

๐Ÿ“š Documentation:


3.1.2 Implement a model rollout strategy using Databricks Model Serving

Keywords: model rollout, model serving, traffic split, served entity, endpoint routing, gradual rollout

๐ŸŽฏ Relevant Demos:

๐Ÿ“š Documentation:


Custom Model Serving

3.2.1 Register a custom PyFunc model and log custom artifacts in Unity Catalog

Keywords: pyfunc model, custom pyfunc, unity catalog model, register model, custom artifacts, mlflow pyfunc

๐Ÿ“š Documentation:


3.2.2 Query custom models via REST API or MLflow Deployments SDK

Keywords: rest api, mlflow deployments, model query, serving endpoint api, predict api, model inference api

๐Ÿ“š Documentation:


3.2.3 Deploy custom model objects using MLflow deployments SDK, REST API or user interface

Keywords: deploy model, mlflow deploy, serving endpoint, model deployment, endpoint deployment, databricks model serving

๐Ÿ“š Documentation:


Study Resources

Official Training

Certification Information

Key GitHub Repositories


Last Updated: December 2025
Exam Version: September 2025

Related Posts

One thought on “Databricks Certified Machine Learning Professional – Comprehensive Resource Guide

Leave a Reply

Your email address will not be published. Required fields are marked *