View on GitHub

smart-data-lake

Framework to quickly build and maintain Smart Data Lakes

Running in the public cloud

Smart Data Lake Builder is build to be run anywhere. As seen in Getting Started, you can easily run it locally with small flat files, but you can just as well run it in a large cluster to leverage the resources of your on-premise or cloud infrastructure.

At the moment, we have tested Smart Data Lake Builder on Google DataProc, Microsoft Azure and YARN. See the following pages on details on how to install and run it:

Running on Microsoft Azure
Running on Google DataProc Running on Yarn