Part 3 of the series of post on the written resources to support preparing the the Databricks Data Engineer Professional Exam – Data Modeling.
Section 3 is all about decisions relating to Data Engineering – streaming to bronze, quality enforcement and slowly changing dimensions (SCD). 20% of the total marks are allocated to this section. There is some overlap with the topics from the previous sections.
DE Pro Section 3: Data Modeling
- Describe the objective of data transformations during promotion from bronze to silver
- Discuss how Change Data Feed (CDF) addresses past difficulties propagating updates and deletes within Lakehouse architecture
How to Simplify CDC With Delta Lake’s Change Data Feed | Databricks Blog
- Apply Delta Lake clone to learn how shallow and deep clone interact with source/target tables.
- Design a multiplex bronze table to avoid common pitfalls when trying to productionalize streaming workloads.
- Implement best practices when streaming data from multiplex bronze tables.
- Apply incremental processing, quality enforcement, and deduplication to process data from bronze to silver
- Make informed decisions about how to enforce data quality based on strengths and limitations of various approaches in Delta Lake
- Implement tables avoiding issues caused by lack of foreign key constraints
- Add constraints to Delta Lake tables to prevent bad data from being written
https://www.databricks.com/discover/pages/data-quality-management#constraints-and-validate
- Implement lookup tables and describe the trade-offs for normalized data models
- Diagram architectures and operations necessary to implement various Slowly Changing Dimension tables using Delta Lake with streaming
One thought on “Databricks Data Engineer Professional Data Modeling”