Blog | Smart Data Lake Builder

Integrating SDLB with Databricks

December 12, 2024 · 7 min read

The Databricks platform provides an easy accessible and configurable way to implement a modern analytics platform. Smart Data Lake Builder complements Databricks as an open source, portable automation tool to load and transform the data.

In this article, we describe the seamless integration of Smart Data Lake Builder (SDLB) in Databricks Notebooks, which allows you to:

Run and Modify SDLB Pipelines directly in Databricks
Display contents of DataObjects in your Notebooks
Use code completion to browse through Actions and DataObjects
Execute individual Actions from a Notebook Cell

SDLB UI Updated!

September 27, 2024 · 4 min read

Shasha Jiang

Zacharias Kull

We are thrilled to announce a major overhaul of the SDLB UI designed to revolutionize the way we visualize, track, and manage data pipelines! This addition to SDLB acts as a user-friendly technical data catalog that showcases dependencies through a lineage graph, along with metadata about data pipeline execution (workflows).

SDLBs UI tool is a step forward in data pipeline management, providing enhanced visibility and control. Whether you are a data engineer, analyst, or scientist, this tool will enhance your abilities to manage data pipelines, allowing you to be more productive and achieve exceptional results.

Data Mesh with SDL

September 17, 2023 · 6 min read

Patrick Grütter

Data Mesh is an emerging concept gaining momentum across organizations. It is often described as a sociotechnical paradigm because a paradigm shift towards Data Mesh does not simply involve technological changes but also has sociological implications. As such, discussing a technical framework like Smart Data Lake Builder and analyzing how well it fits the Data Mesh paradigm can inherently only be part of the whole picture. Nevertheless, we want to do the exercise here to see how a technological framework can support the adoption.

In this article, we'll explore how the Smart Data Lake Builder aligns with the four principles as outlined by Zhamak Dehghani and assess which key concepts it supports.

Housekeeping

June 8, 2023 · 6 min read

Patrick Grütter

In this article, we're taking a look on how we use SDLB's housekeeping features to keep our pipelines running efficiently.

Incremental historization using CDC and Airbyte MSSQL connector

May 11, 2022 · 13 min read

Mandes Schönherr

Dr.sc.nat.

In many cases datasets have no constant live. New data points are created, values changed and data expires. We are interested in keeping track of all these changes. This article first presents collecting data utilizing JDBC and deduplication on the fly. Then, a Change Data Capture (CDC) enabled (MS)SQL table will be transferred and historized in the data lake using the Airbyte MS SQL connector supporting CDC. Methods for reducing the computational and storage efforts are mentioned.

Combine Spark and Snowpark to ingest and transform data in one pipeline

April 6, 2022 · 8 min read

Zach Kull

Data Expert

This article shows how to create one unified data pipeline that uses Spark to ingest data into Snowflake, and Snowpark to transform data inside Snowflake.

Using Airbyte connector to inspect github data

March 18, 2022 · 5 min read

Mandes Schönherr

Dr.sc.nat.

This article presents the deployment of an Airbyte Connector with Smart Data Lake Builder (SDLB). In particular the github connector is implemented using the python sources.