A curated collection of demos, official documentation, and training resources mapped to each exam objective for the Databricks Certified Machine Learning Professional certification (September 2025 version).
Table of Contents
- About the Author
- About the Exam
- What’s Changed from April 2024
- Exam Breakdown & Study Strategy
- How to Use This Guide
- Key Demos
- Section 1: Model Development (~47% of exam)
- Section 2: Model Lifecycle Management (MLOps) (~43% of exam)
- Section 3: Model Deployment (~10% of exam)
- Study Resources
About the Author
I’m a Databricks Solutions Architect Champion with extensive experience in machine learning engineering and MLOps. This guide is designed to help you navigate the ML Professional certification, which is one of the more challenging Databricks certifications.
The ML Professional exam tests your ability to build production-grade ML systems at enterprise scale. This isn’t about knowing ML algorithms – it’s about knowing how to operationalise them using Databricks tools like SparkML, MLflow, Feature Store, Lakehouse Monitoring, and Model Serving.
I created this guide by analysing the exam objectives and mapping them to the best available resources. My advice: don’t just read – practice! Get hands-on with MLflow experiments, build feature pipelines, deploy models, and set up monitoring. The exam questions are scenario-based, so you need practical experience.
Use this guide to get a big picture view of the space, find some demos and use the links mapped to each item on the exam to fill any gaps in your knowledge. I have previously written on my approach to taking certs here. Good luck on your certification journey!
About the Exam
- Exam Name: Databricks Certified Machine Learning Professional
- Version: September 2025
- Questions: 59 scored multiple-choice
- Time Limit: 120 minutes
- Registration Fee: USD $200
- Validity: 2 years
- Prerequisite: None required; 1+ year hands-on experience highly recommended
Recommended Preparation
- Instructor-led: Machine Learning at Scale and Advanced Machine Learning Operations
- Self-paced: Available in Databricks Academy
- Working knowledge of Python, scikit-learn, SparkML, and MLflow
- Working knowledge of Lakehouse Monitoring and Databricks Model Serving
What’s Changed from the April 2024 Syllabus
The September 2025 exam brings significant updates reflecting Databricks’ evolving ML platform:
Structural Changes:
- Consolidated from 4 sections to 3 (Experimentation & Feature Engineering merged with Data Management under Section 1: Model Development)
New Topics Added:
- Distributed Hyperparameter Tuning: Ray and Optuna now explicitly covered for scaling hyperparameter searches
- Databricks Asset Bundles (DABs): New emphasis on infrastructure-as-code for ML asset management
- ML Pipeline Testing: Dedicated coverage of unit tests and integration tests for ML pipelines
- Deployment Strategies: Blue-green and canary deployments now explicitly tested
Updated Focus Areas:
- Lakehouse Monitoring: Replaces generic drift monitoring with platform-specific capabilities
- Unity Catalog Integration: Model aliases and lineage replace legacy Model Registry Webhooks
- Feature Serving: Expanded coverage including on-demand feature computation and real-time serving
Study Tip: If you studied for the April 2024 exam, focus extra time on DABs, Ray/Optuna distributed tuning, and Lakehouse Monitoring – these represent the most significant additions.
Exam Breakdown & Study Strategy
Exam Weight by Topic Area (Estimated)
Based on the number of objectives per section:
| Section | Topics | Objectives | Est. Weight |
|---|---|---|---|
| Section 1: Model Development | SparkML, Scaling, MLflow, Feature Store | 22 | ~47% |
| Section 2: MLOps | Lifecycle, Testing, Environments, Monitoring | 20 | ~43% |
| Section 3: Model Deployment | Strategies, Custom Serving | 5 | ~10% |
Subsection Breakdown
| Subsection | Objectives | Focus Areas |
|---|---|---|
| Using SparkML | 7 | Pipelines, estimators, transformers, evaluation |
| Scaling and Tuning | 7 | Distributed training, Optuna, Ray, parallelization |
| Advanced MLflow | 3 | Nested runs, custom logging, PyFunc models |
| Feature Store | 5 | Point-in-time, online tables, streaming features |
| Drift & Monitoring | 10 | Lakehouse Monitoring, drift detection, alerting |
| Validation Testing | 4 | Unit tests, integration tests, ML pipelines |
| Environment & Lifecycle | 4 | DABs, model registry, environment transitions |
| Deployment | 5 | Blue-green, canary, Model Serving, REST API |
How to Use This Guide Effectively
๐ Official Documentation (docs.databricks.com)
The docs are your reference for exact syntax and configuration options. For this exam:
- Focus on MLflow, Feature Store, and Lakehouse Monitoring docs
- Pay attention to code examples – the exam has code-based questions
- Understand the difference between MLflow tracking vs. registry vs. deployments
Best for: API syntax, configuration options, code patterns
๐ฏ Interactive Demos (databricks.com/resources/demos)
Demos help you see features in action. For ML Pro:
- Watch Model Serving demos to understand endpoint configuration
- Follow Lakehouse Monitoring tutorials for drift detection setup
- DABs demos show infrastructure-as-code patterns
How I use demos:
- Read the objective first
- Watch the demo focusing on that specific feature
- Recreate it in your own workspace
Best for: Understanding workflows, UI navigation, real configurations
๐ Training Courses (Databricks Academy)
The official courses are highly recommended:
- Machine Learning at Scale: SparkML, distributed training, pandas UDFs
- Advanced Machine Learning Operations: MLflow, Feature Store, Model Serving, Monitoring
Best for: Structured learning, hands-on labs, comprehensive coverage
๐ Background Reading
Before diving into specifics, these resources provide essential big-picture context:
The Big Book of MLOps – Get the PDF
Start here! This gives you the conceptual foundation for MLOps workflows. Understanding the big picture first makes mapping concepts to specifics much easier as you study.
MLflow Documentation – Essential reading for ~30% of the exam:
- Experiment Tracking – Core concept, understand runs, metrics, artifacts
- Model Flavors & PyFunc – Know when to use pyfunc for custom models
- Model Registry – Lifecycle management concepts
MLOps Automation – Understanding model lifecycle automation:
- Model Registry Webhooks – While the September 2025 exam focuses on Unity Catalog model aliases, understanding webhook concepts helps with MLOps automation patterns
Understanding Drift – Critical for Lakehouse Monitoring section:
- Introduction to ML Drift – Types of drift explained
- Know the four drift types: Feature drift, Label drift, Prediction drift, Concept drift
- Know which statistical tests to use: Kolmogorov-Smirnov (numerical), Chi-squared (categorical), Jensen-Shannon divergence (distributions)
Feature Engineering:
- Real-time Feature Computation Best Practices – Batch vs streaming trade-offs
My Recommended Study Path
Phase 1: Foundation (Model Development)
- Start with SparkML basics – pipelines, estimators, transformers
- Learn distributed training with pandas UDFs and Optuna
- Master MLflow – experiments, nested runs, custom logging
- Understand Feature Store – point-in-time correctness is critical
Phase 2: Operations (MLOps)
- Study Lakehouse Monitoring deeply – 10 objectives here!
- Learn DABs for ML asset management
- Understand testing strategies for ML systems
- Practice automated retraining workflows
Phase 3: Deployment
- Learn deployment strategies (blue-green, canary)
- Practice Model Serving – endpoints, traffic routing
- Understand PyFunc models and custom serving
Key Topics That Often Appear
Based on the objective distribution, focus extra attention on:
- Lakehouse Monitoring (10 objectives) – drift detection, inference tables, alerting
- SparkML (7 objectives) – when to use it, pipeline construction, evaluation
- Distributed Training (7 objectives) – Optuna, Ray, pandas UDFs, parallelization
- Feature Store (5 objectives) – point-in-time, online tables, feature serving
Practice Environment
Get hands-on experience with:
- Databricks Free Edition – Free, no credit card required
- MLOps Stacks – Production ML templates
Key Demos to Work Through
These demos provide comprehensive coverage of multiple exam objectives. Work through these first to get the big picture:
Must-Do: End-to-End MLOps
- MLOps End-to-End Pipeline (Hands-on Tutorial) – Work through the full ML lifecycle from feature engineering to deployment. Essential hands-on practice for understanding how all the pieces fit together.
Model Development & Training
- Data Science and Machine Learning on Databricks – Platform overview for ML
- Databricks Machine Learning Workspace – ML workspace features
- How to Use AutoML to Develop ML Models – AutoML capabilities
- Databricks AutoML – Interactive AutoML tour
Feature Store & Feature Engineering
- Feature Store and Online Inference – Feature pipelines and online serving
Model Serving & Deployment
- Model Serving on the Lakehouse – Deployment fundamentals
- Model Serving Databricks – Interactive serving tour
Monitoring & Drift Detection
- Lakehouse Monitoring and Vector Search – Monitoring overview
- Lakehouse Monitoring Databricks – Interactive monitoring tour
- Monitor Your Data Quality with Lakehouse Monitoring – Hands-on monitoring tutorial
Section 1: Model Development
Using Spark ML
1.1.1 Identify when SparkML is recommended based on the data, model, and use case requirements
Keywords: sparkml, spark ml, mllib, distributed ml, large dataset, big data, when to use spark
๐ Documentation:
1.1.2 Construct an ML pipeline using SparkML
Keywords: sparkml pipeline, ml pipeline, spark pipeline, pyspark ml, pipeline stages
๐ Documentation:
1.1.3 Apply the appropriate estimator and/or transformer given a use case
Keywords: estimator, transformer, stringindexer, onehotencoder, vectorassembler, feature transformer, spark estimator
๐ Documentation:
1.1.4 Tune a SparkML model using MLlib
Keywords: hyperparameter tuning, crossvalidator, paramgrid, trainvalidationsplit, spark tuning, mllib tuning
๐ Documentation:
1.1.5 Evaluate a SparkML model
Keywords: model evaluation, evaluator, binaryclassificationevaluator, multiclassclassificationevaluator, regressionevaluator, metrics
๐ Documentation:
1.1.6 Score a Spark ML model for a batch or streaming use case
Keywords: batch scoring, batch inference, streaming inference, model scoring, spark transform, predict
๐ฏ Relevant Demos:
- Iot And Predictive Maintenance (Tutorial)
๐ Documentation:
1.1.7 Select SparkML model or single node model for an inference based on type: batch, real-time, streaming
Keywords: inference type, batch vs realtime, model selection, single node, distributed inference, streaming ml
๐ Documentation:
Scaling and Tuning
1.2.1 Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs
Keywords: distributed training, pandas udf, pandas function api, applyinpandas, mapinpandas, scale training
๐ Documentation:
1.2.2 Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow
Keywords: optuna, hyperparameter optimization, distributed tuning, mlflow optuna, hyperopt, mlflowsparkstudy
๐ Documentation:
1.2.3 Perform distributed hyperparameter tuning using Ray
Keywords: ray, ray tune, distributed hyperparameter, ray on spark, ray cluster
๐ Documentation:
1.2.4 Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments
Keywords: vertical scaling, horizontal scaling, scale up, scale out, cluster sizing, ml workload scaling
๐ Documentation:
1.2.5 Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training
Keywords: model parallelism, data parallelism, distributed deep learning, parallel training, large scale training
๐ Documentation:
1.2.6 Compare Ray and Spark for distributing ML training workloads
Keywords: ray vs spark, distributed ml framework, training framework comparison, ray spark
๐ Documentation:
1.2.7 Use the Pandas Function API to parallelize group-specific model training and perform inference
Keywords: pandas function api, group-specific training, applyinpandas, grouped map, parallel inference
๐ Documentation:
Advanced MLflow Usage
1.3.1 Utilize nested runs using MLflow for tracking complex experiments
Keywords: nested runs, mlflow nested, parent run, child run, experiment tracking, mlflow runs
๐ Documentation:
1.3.2 Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows
Keywords: mlflow log, custom metrics, log_metric, log_param, log_artifact, mlflow tracking
๐ Documentation:
1.3.3 Create custom model objects using real-time feature engineering
Keywords: custom model, pyfunc, mlflow pyfunc, custom pyfunc, real-time features, model wrapper
๐ Documentation:
Advanced Feature Store Concepts
1.4.1 Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference
Keywords: point-in-time, feature lookup, data leakage, temporal correctness, feature store lookup, time travel features
๐ Documentation:
1.4.2 Build automated pipelines for feature computation using the FeatureEngineering Client
Keywords: feature engineering client, feature computation, feature pipeline, automated features, databricks feature store
๐ Documentation:
1.4.3 Configure online tables for low-latency applications using Databricks SDK
Keywords: online tables, online store, low latency, feature serving, online feature, databricks sdk
๐ Documentation:
1.4.4 Design scalable solutions for ingesting and processing streaming data to generate features in real time
Keywords: streaming features, real-time features, streaming ingestion, feature generation, streaming pipeline
๐ Documentation:
1.4.5 Develop on-demand features using feature serving for consistent use across training and production environments
Keywords: on-demand features, feature serving, feature function, training serving consistency, feature consistency
๐ Documentation:
Section 2: MLOps
Model Lifecycle Management
2.1.1 Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy
Keywords: model lifecycle, deploy code, environment transition, dev staging prod, ml pipeline architecture, mlops pipeline
๐ Documentation:
2.1.2 Map Databricks features to activities of the model lifecycle management process
Keywords: model registry, unity catalog models, model versioning, model alias, model lifecycle, registered model
๐ Documentation:
Validation Testing
2.2.1 Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs
Keywords: unit test, pytest, notebook testing, function testing, test databricks, unit testing ml
๐ฏ Relevant Demos:
๐ Documentation:
2.2.2 Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.)
Keywords: integration test, test types, dev test prod, environment testing, ml testing strategy
๐ Documentation:
2.2.3 Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference
Keywords: ml integration test, pipeline testing, end-to-end test, ml system test, inference test
๐ Documentation:
2.2.4 Compare the benefits and challenges of approaches for organizing functions and unit tests
Keywords: test organization, test structure, testing best practices, ml testing patterns
๐ Documentation:
Environment Architectures
2.3.1 Design and implement scalable Databricks environments for machine learning projects using best practices
Keywords: ml environment, workspace architecture, ml best practices, databricks environment, ml infrastructure
๐ Documentation:
2.3.2 Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models
Keywords: databricks asset bundles, dabs, asset bundle, ml assets, infrastructure as code, bundle deploy
๐ฏ Relevant Demos:
- Databricks Asset Bundles (Tours)
๐ Documentation:
Automated Retraining
2.4.1 Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts
Keywords: automated retraining, retrain trigger, drift retraining, performance degradation, retraining workflow
๐ Documentation:
2.4.2 Develop a strategy for selecting top-performing models during automated retraining
Keywords: model selection, champion challenger, model comparison, best model, retraining strategy
๐ Documentation:
Drift Detection and Lakehouse Monitoring
2.5.1 Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes
Keywords: drift detection, statistical test, ks test, chi-square, drift metrics, lakehouse monitoring drift
๐ Documentation:
2.5.2 Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why
Keywords: lakehouse monitoring, monitoring feature, table type, snapshot table, time series table
๐ฏ Relevant Demos:
- Lakehouse Monitoring And Vector Search (Video Tour)
- Lakehouse Monitoring Databricks (Tours)
- Monitor Your Data Quality With Lakehouse Monitoring (Tutorial)
๐ Documentation:
2.5.3 Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring
Keywords: create monitor, snapshot monitor, time series monitor, inference table monitor, lakehouse monitoring setup
๐ Documentation:
2.5.4 Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc.
Keywords: monitoring pipeline, model logging, model health, performance monitoring, ml monitoring components
๐ Documentation:
2.5.5 Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds
Keywords: drift alerting, monitoring alerts, threshold alert, notification, drift threshold
๐ Documentation:
2.5.6 Detect data drift by comparing current data distributions to a known baseline or between successive time windows
Keywords: data drift, baseline comparison, distribution shift, time window drift, drift baseline
๐ Documentation:
2.5.7 Evaluate model performance trends over time using an inference table
Keywords: inference table, performance trend, model performance over time, inference logging, prediction logging
๐ Documentation:
2.5.8 Define custom metrics in Lakehouse Monitoring metrics tables
Keywords: custom metrics, monitoring metrics, metrics table, custom monitoring, define metrics
๐ Documentation:
2.5.9 Evaluate metrics based on different data granularities and feature slicing
Keywords: feature slicing, data granularity, segment analysis, slice metrics, cohort analysis
๐ Documentation:
2.5.10 Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage
Keywords: endpoint health, latency monitoring, request rate, error rate, infrastructure metrics, endpoint metrics
๐ Documentation:
Section 3: Model Deployment
Deployment Strategies
3.1.1 Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications
Keywords: blue-green deployment, canary deployment, deployment strategy, traffic routing, rollout strategy, a/b deployment
๐ Documentation:
3.1.2 Implement a model rollout strategy using Databricks Model Serving
Keywords: model rollout, model serving, traffic split, served entity, endpoint routing, gradual rollout
๐ฏ Relevant Demos:
- Model Serving On The Lakehouse (Video Tour)
- Model Serving Databricks (Tours)
๐ Documentation:
Custom Model Serving
3.2.1 Register a custom PyFunc model and log custom artifacts in Unity Catalog
Keywords: pyfunc model, custom pyfunc, unity catalog model, register model, custom artifacts, mlflow pyfunc
๐ Documentation:
3.2.2 Query custom models via REST API or MLflow Deployments SDK
Keywords: rest api, mlflow deployments, model query, serving endpoint api, predict api, model inference api
๐ Documentation:
3.2.3 Deploy custom model objects using MLflow deployments SDK, REST API or user interface
Keywords: deploy model, mlflow deploy, serving endpoint, model deployment, endpoint deployment, databricks model serving
๐ Documentation:
Study Resources
Official Training
- Machine Learning at Scale (Databricks Academy)
- Advanced Machine Learning Operations (Databricks Academy)
Certification Information
- ML Professional Exam Page
- Databricks Free Edition – Practice for free
Key GitHub Repositories
- mlops-stacks – Production ML best practices
- databricks-ml-examples – ML code examples
Last Updated: December 2025
Exam Version: September 2025
One thought on “Databricks Certified Machine Learning Professional – Comprehensive Resource Guide”