Getting Started with Databricks: A Quick Guide

Where to get started – I hear this a lot. In this guide, I’ll compile a few essentials and provide some practical ideas to help you begin your journey into getting started and learning Databricks.

Why Learn Databricks?

I meet a lot of people who are eager to get started with Databricks. With rising demand for skills in this area, upskilling is a great idea.

Roles and Learning Paths

Databricks is a large platform with many capabilities, so try and follow the material that supports your role, common roles the platform supports are Data Engineer, Data Analyst, ML Engineer and Data Scientist. All roles will require some general background and then some specialist knowledge.

Video training and example code

 Databricks provide a training portal. You can sign up for free to the Customer portal or if you work for a partner organisation, the partner portal.

This area has many courses containing all the learning materials. You will find notebooks, Python files and videos all organised into courses and learning paths. The courses are being offer in many languages now beyond English.

The “Get Started with Databricks for … ” are a good series of courses for many roles.

  • Get Started with Databricks for Data Engineering
  • Get Started with Databricks for Business Leaders
  • Get Started with Databricks for GenAI
  • Get Started with Databricks for Machine Learning
  • Get Started with Databricks for Platform Administration

A key course for the overview of the platform capabilities is Lakehouse Fundamentals.

This course is made up of short videos and is followed by a 20-question test. Passing the test gives you the Lakehouse Fundamentals badge. A shareable achievement that shows your progress!

There are other longer courses and learning plans for free that will prepare you for other badges and certifications [see my guide on getting certified here].

At this point you should have a good top-level view of the platform and what it can do. Pause and think about how the components you’ve discovered could be applied to scenarios you have encountered or solve issues you have come across in your work experience.

Open Source Components

Databricks uses and created many open source components. They have their own websites with good documentation and blog posts

  • Unity Catalog
  • Delta Lake
  • Apache Spark
  • MLflow

It is good to understand the top-level use case for these components. It will help you know when to apply them and help you speak the same language as your team and customers.

Get hands on

So far, we have been reading and watching videos, but nothing beats hands on practice.

You can sign up for a free trial – 2 weeks only sign up here.

After your 2 weeks, you can sign up for a free trial with a cloud provider to receive some free credits and launch a Databricks instance on that cloud. Note that the free trial typically requires you to add a credit card. It’s important to monitor the duration of your trial and your usage to avoid unexpected charges. Once the trial period ends, you can continue using your Databricks instance on a pay-as-you-go basis. However, ensure you have the necessary controls and checks in place to prevent overspending and receiving a surprise bill from your cloud provider.

Now you have a trial instance – what can you do with it. Many of the courses on the Databricks academy will include a zip file of notebooks you can import and start to play with. Run these and do the lab exercises as this is the best way to embed your learning.

Another way is to install dbdemos here. There are many to choose from so find something that interests you and is relevant to your role.

Learning Spark

You can also learn Spark outside of using Databricks. Spark is available as a local install to allow you to write spark code on a laptop. Go to this page for download and install instructions. This way you know you can run and write code without the risk of incurring costs on a cloud platform. Databricks is much more than a hosted and managed Spark install, but learning Spark will be a good investment when working on Databricks.

You can write Spark in a notebook environment, in a Spark Shell or in a text editor as files. The Spark Shell can be useful for some quick hands-on practice, notebooks lend themselves well to learning. Using files will enable you to apply some software engineering best practices, like linting, unit testing, using a code editor. While you are learning Spark be sure to bookmark the documentation.

Free Spark Book

Learning Spark ebook is available as a free download from the Databricks website. Although this was written back in 2020 there is a lot you can learn from this book. The Spark has been updated from v3 to v.3.5.4, so some of the code examples in the book may not work. I took this as an exercise to update the given code to get it running with the latest Spark version – a great way of getting some hands-on practice. The Spark architecture sections will give you a good foundation on understanding how to get the most out of your Spark code in Databricks.

 

Additional Resources

  • YouTube: Check out a variety of content, including talks from the annual Data & AI Summit.
  • LinkedIn: Follow industry experts to stay updated and find interesting perspectives.
  • Blogs: Databricks and open-source technology websites have high-quality information that brings topics to life more than documentation alone.
  • Meetups: If you can go to meetups, hear about how people are using Databricks and the problems they are solving
  • Forums: Explore the forum to ask questions and engage with the community.

Good luck on your learning journey! If you are interested in going to the next stage and getting certified see my guide here

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *