Databricks Data Engineer Professional Data Modeling

Part 3 of the series of post on the written resources to support preparing the the Databricks Data Engineer Professional Exam – Data Modeling.

Section 3 is all about decisions relating to Data Engineering – streaming to bronze, quality enforcement and slowly changing dimensions (SCD). 20% of the total marks are allocated to this section. There is some overlap with the topics from the previous sections.

DE Pro Section 3: Data Modeling

  • Describe the objective of data transformations during promotion from bronze to silver

Different Data Warehousing Modeling Techniques and How to Implement them on the Databricks Lakehouse Platform – The Databricks Blog

  • Discuss how Change Data Feed (CDF) addresses past difficulties propagating updates and deletes within Lakehouse architecture

How to Simplify CDC With Delta Lake’s Change Data Feed | Databricks Blog

  • Apply Delta Lake clone to learn how shallow and deep clone interact with source/target tables.

https://www.databricks.com/blog/2020/09/15/easily-clone-your-delta-lake-for-testing-sharing-and-ml-reproducibility.html

  • Design a multiplex bronze table to avoid common pitfalls when trying to productionalize streaming workloads.

https://www.databricks.com/blog/2022/04/27/how-uplift-built-cdc-and-multiplexing-data-pipelines-with-databricks-delta-live-tables.html

  • Implement best practices when streaming data from multiplex bronze tables.
  • Apply incremental processing, quality enforcement, and deduplication to process data from bronze to silver
  • Make informed decisions about how to enforce data quality based on strengths and limitations of various approaches in Delta Lake
  • Implement tables avoiding issues caused by lack of foreign key constraints
  • Add constraints to Delta Lake tables to prevent bad data from being written

https://www.databricks.com/discover/pages/data-quality-management#constraints-and-validate

  • Implement lookup tables and describe the trade-offs for normalized data models
  • Diagram architectures and operations necessary to implement various Slowly Changing Dimension tables using Delta Lake with streaming

Related Posts

One thought on “Databricks Data Engineer Professional Data Modeling

Leave a Reply

Your email address will not be published. Required fields are marked *