Skip to main content

Integrating SDLB with Databricks

· 7 min read

The Databricks platform provides an easy accessible and configurable way to implement a modern analytics platform. Smart Data Lake Builder complements Databricks as an open source, portable automation tool to load and transform the data.

In this article, we describe the seamless integration of Smart Data Lake Builder (SDLB) in Databricks Notebooks, which allows you to:

  • Run and Modify SDLB Pipelines directly in Databricks
  • Display contents of DataObjects in your Notebooks
  • Use code completion to browse through Actions and DataObjects
  • Execute individual Actions from a Notebook Cell

SDLB UI Updated!

· 4 min read

We are thrilled to announce a major overhaul of the SDLB UI designed to revolutionize the way we visualize, track, and manage data pipelines! This addition to SDLB acts as a user-friendly technical data catalog that showcases dependencies through a lineage graph, along with metadata about data pipeline execution (workflows).

SDLBs UI tool is a step forward in data pipeline management, providing enhanced visibility and control. Whether you are a data engineer, analyst, or scientist, this tool will enhance your abilities to manage data pipelines, allowing you to be more productive and achieve exceptional results.

Data Mesh with SDL

· 6 min read

Data Mesh is an emerging concept gaining momentum across organizations. It is often described as a sociotechnical paradigm because a paradigm shift towards Data Mesh does not simply involve technological changes but also has sociological implications. As such, discussing a technical framework like Smart Data Lake Builder and analyzing how well it fits the Data Mesh paradigm can inherently only be part of the whole picture. Nevertheless, we want to do the exercise here to see how a technological framework can support the adoption.

In this article, we'll explore how the Smart Data Lake Builder aligns with the four principles as outlined by Zhamak Dehghani and assess which key concepts it supports.

Incremental historization using CDC and Airbyte MSSQL connector

· 13 min read

In many cases datasets have no constant live. New data points are created, values changed and data expires. We are interested in keeping track of all these changes. This article first presents collecting data utilizing JDBC and deduplication on the fly. Then, a Change Data Capture (CDC) enabled (MS)SQL table will be transferred and historized in the data lake using the Airbyte MS SQL connector supporting CDC. Methods for reducing the computational and storage efforts are mentioned.